From goran.brostrom at gmail.com Sat Oct 1 00:51:08 2011 From: goran.brostrom at gmail.com (=?UTF-8?B?R8O2cmFuIEJyb3N0csO2bQ==?=) Date: Sat, 1 Oct 2011 00:51:08 +0200 Subject: [R] coxreg vs coxph: time-dependent treatment In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Sat Oct 1 03:54:22 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 30 Sep 2011 21:54:22 -0400 Subject: [R] need help with contourplot figure In-Reply-To: References: Message-ID: On Sep 30, 2011, at 2:10 PM, Mike Gibson wrote: > > I can't figure out how to add tick marks on both my X and Y axis. > For example, my X axis ranges from 0 to 1 and there are both a tick > mark and a number label at the X-axis values of 0.2,0.4,0.6. and > 0.8. I want to add tick marks to the figure at every 0.1 value. > This will help a viewer determine the values on the x axis. > > > I included all of my code. But no data. So the only testing was done on help page example. > I apologize but it is very long. I am created a contourplot figure > that will be a jpg. I also included my notes after the # sign so > you can see what I am doing. Any help would be greatly appreciated. > > The usual way to modify the default features and locations of ticks inside lattice plots is with arguments presented as "scale" values. As an example to the help(contourplot) example I added: scale=list(x=list(at=seq(4,20, by=5))), ... and instead of ticks at 5,10,15, 20, I get them at 4,9,14,19 ?xyplot -- David. > jpeg(file="C:/Documents and Settings/Michael/My Documents/Mike/ > amberjack/Reefs_Model/YPRlevel.jpg", width=8,height=8, unit="in", > res=300) #location of file and size > x<-contourplot(YPR~F*Length, data=yprplot2, > at > = > c > (2.0,3.0,4.0,5.0,5.5,6.0,6.25,6.5,6.75,7.0,7.12,7.25,7.35,7.45,7.5), > ylim=c(25,40), xlim=c(0,1),xlab = "Fishing Mortality Rate", ylab = > "Minimum Size (inches)", > panel=function(...){ #This step adds the point for the current > YPR value, ... means to leave the function open to bring in other > functions > panel.levelplot(...) > grid.points(0.609,30,pch=8) #pch is the point character where > 19 is a closed circle > grid.points(0.333,30,pch=8) #pch is the point character where > 19 is a closed circle > grid.points(eumetric$F, eumetric > $Length,pch=18,gp=gpar(col="black", cex=.9))}) #add the eumetric > line and make them points > #now add the text for the current ypr location > print(x) #this brings up the figure I already made > grid.text('Fcurrent',0.62,0.35,gp=gpar(col="black", cex=1)) #0.42 > and 0.40 is the location of the text on the figure > grid.text('Fmsy',0.38,0.35,gp=gpar(col="black", cex=1)) #0.42 and > 0.40 is the location of the text on the figure > dev.off() #it won't send the pdf until this is added. It turns > off the pdf function > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From joshb41 at yahoo.com Sat Oct 1 00:10:16 2011 From: joshb41 at yahoo.com (Josh B) Date: Fri, 30 Sep 2011 15:10:16 -0700 (PDT) Subject: [R] Data simulation for ANOVA decomposition into sums of squares Message-ID: <1317420616.69249.YahooMailNeo@web110103.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From francois.pepin at sequentainc.com Sat Oct 1 00:25:25 2011 From: francois.pepin at sequentainc.com (Francois Pepin) Date: Fri, 30 Sep 2011 15:25:25 -0700 Subject: [R] Hi In-Reply-To: References: <9CCFD2B2-373B-40B1-A57E-80845AB06977@sequentainc.com> Message-ID: <45294E56-D201-4B85-82F4-E9ACE208196C@sequentainc.com> Hi Jiang, where did you get that definition of the Benjamini-Hochberg correction? That is simply not how it works. You can take a look at the original paper, it is available online (http://www.math.tau.ac.il/~ybenja/MyPapers/benjamini_hochberg1995.pdf). You can also look at the p.adjust function to see how it is implemented in R. Also, please keep the replies on the list. They're archived so that other people can refer to it later it also gives other people the opportunity to reply if they have more to add. Francois On Sep 30, 2011, at 15:07 , chunjiang he wrote: > Hi Francois, > > Thansk for your reply. I did BH correction manually. Just use corrected pvalue=pvalue*(n/n-1). And I got the result like this: The pvalue in manual and by R package are different. > > title r raw R package manual > hsa-miR-205--GATA3 0.797883 1.08E-13 1.08E-12 1.08E-12 > hsa-miR-205--ITGB4 0.750218 1.85E-11 8.20E-11 9.25E-11 > hsa-miR-187--PGF 0.797604 3.24E-11 8.20E-11 1.08E-10 > hsa-miR-205--SERPINB5 0.744125 3.28E-11 8.20E-11 8.20E-11 > hsa-miR-205--PBX1 0.734487 7.89E-11 1.58E-10 1.58E-10 > hsa-miR-205--MCC 0.724999 1.80E-10 3.00E-10 3.00E-10 > hsa-miR-205--WNT5B 0.717705 3.33E-10 3.78E-10 4.76E-10 > hsa-miR-200c--PKN2 0.721747 3.46E-10 3.78E-10 4.33E-10 > hsa-miR-200c--PCYOX1 0.721698 3.48E-10 3.78E-10 3.87E-10 > hsa-miR-200c--WDR68 0.72068 3.78E-10 3.78E-10 3.78E-10 > > So I am confused about that. > > Best, > Jiang > On Fri, Sep 30, 2011 at 4:53 PM, Francois Pepin wrote: > Hi Jiang, > > you'll have to give us some more information than this: a reproducible example and why you expect things to be different. > > This method has been tested extensively so we'd need something more specific if we are to help you. > > Francois > > On Sep 30, 2011, at 14:40 , chunjiang he wrote: > > > Hi, > > > > There is a question that I am confused. > > I have a set of data like this: > > > > hsa-miR-205--GATA3 0.797882767 1.08E-13 > > hsa-miR-205--ITGB4 0.750217593 1.85E-11 > > hsa-miR-187--PGF 0.797604155 3.24E-11 > > hsa-miR-205--SERPINB5 0.744124886 3.28E-11 > > hsa-miR-205--PBX1 0.734487224 7.89E-11 > > hsa-miR-205--MCC 0.72499934 1.80E-10 > > hsa-miR-205--WNT5B 0.717705259 3.33E-10 > > hsa-miR-200c--PKN2 0.721746815 3.46E-10 > > hsa-miR-200c--PCYOX1 0.721698034 3.48E-10 > > hsa-miR-200c--WDR68 0.72068017 3.78E-10 > > > > And I want to do the Benjamini & Hochberg correction. > > > > So I run : > > > > rm(list=ls()) > > a<-read.csv("1-correlation.txt",sep="\t",header=F,quote="") > > c<-p.adjust(a$V3,"BH") > > a[,4]<-c > > write.table(a,"zz.txt",sep="\t") > > > > And I got the result: > > > > hsa-miR-205--GATA3 0.797882767 1.08E-13 1.08E-12 > > hsa-miR-205--ITGB4 0.750217593 1.85E-11 8.20E-11 > > hsa-miR-187--PGF 0.797604155 3.24E-11 8.20E-11 > > hsa-miR-205--SERPINB5 0.744124886 3.28E-11 8.20E-11 > > hsa-miR-205--PBX1 0.734487224 7.89E-11 1.58E-10 > > hsa-miR-205--MCC 0.72499934 1.80E-10 3.00E-10 > > hsa-miR-205--WNT5B 0.717705259 3.33E-10 3.78E-10 > > hsa-miR-200c--PKN2 0.721746815 3.46E-10 3.78E-10 > > hsa-miR-200c--PCYOX1 0.721698034 3.48E-10 3.78E-10 > > hsa-miR-200c--WDR68 0.72068017 3.78E-10 3.78E-10 > > > > When I check it, I found some adjust pvalue are not same to result that I > > did manually. > > > > Can anyone help for this. Thanks, > > > > Jiang > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > From jami5490 at mylaurier.ca Sat Oct 1 00:59:22 2011 From: jami5490 at mylaurier.ca (spicymchaggis101) Date: Fri, 30 Sep 2011 15:59:22 -0700 (PDT) Subject: [R] error while using shapiro.test() Message-ID: <1317423562223-3861535.post@n4.nabble.com> hey all, I'm just getting used to R and i'm having issues when it comes to reading my data in rows rather than columns. any good advice would be much appreciated ! here is the error: > data1 <- read.table(file.choose(),header=T) > x1 <- c(data1[1,1:5]) > shapiro.test(x1) Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic This is my data: Noun Verb adverb Adjective Preposition Total 1 21 4 9 1 1 36 2 26 4 5 3 0 38 3 31 6 6 0 0 43 4 37 4 6 0 0 47 5 22 3 1 10 0 36 6 12 8 0 9 0 29 7 22 5 0 5 0 32 8 12 11 0 12 0 35 9 27 7 1 10 0 45 10 20 4 1 10 1 36 11 21 6 1 6 0 34 12 29 1 0 8 0 38 13 16 19 0 8 0 43 14 23 6 2 5 0 36 15 29 5 0 2 0 36 16 33 7 0 3 0 43 17 29 4 0 2 0 35 18 24 7 0 7 1 39 19 28 2 1 3 0 34 20 32 5 0 7 0 44 21 19 6 0 5 0 30 22 19 7 3 4 0 33 23 19 4 1 5 1 30 24 29 8 3 4 0 44 25 34 9 0 1 0 44 26 27 9 1 3 0 40 27 13 11 5 9 0 38 28 17 7 0 10 0 34 29 21 6 0 7 0 34 30 28 4 0 5 0 37 -- View this message in context: http://r.789695.n4.nabble.com/error-while-using-shapiro-test-tp3861535p3861535.html Sent from the R help mailing list archive at Nabble.com. From LJPJARAMILLO at hotmail.com Sat Oct 1 03:03:37 2011 From: LJPJARAMILLO at hotmail.com (LUIS JARAMILLO) Date: Fri, 30 Sep 2011 20:03:37 -0500 Subject: [R] =?iso-8859-1?q?_manual_R_en_espa=F1ol?= Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From koshihaku at gmail.com Sat Oct 1 03:31:08 2011 From: koshihaku at gmail.com (koshihaku) Date: Fri, 30 Sep 2011 18:31:08 -0700 (PDT) Subject: [R] Is the output of survfit.coxph survival or baseline survival? Message-ID: <1317432668597-3861919.post@n4.nabble.com> Dear all, I am confused with the output of survfit.coxph. Someone said that the survival given by summary(survfit.coxph) is the baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}. Which one is correct? By the way, if I use "newdata=" in the survfit, does that mean the survival is estimated by the value of covariates in the new data frame? Thank you very much! Koshihaku -- View this message in context: http://r.789695.n4.nabble.com/Is-the-output-of-survfit-coxph-survival-or-baseline-survival-tp3861919p3861919.html Sent from the R help mailing list archive at Nabble.com. From 1 at VictoriasJourney.com Sat Oct 1 05:12:29 2011 From: 1 at VictoriasJourney.com (Victoria_Stuart) Date: Fri, 30 Sep 2011 20:12:29 -0700 (PDT) Subject: [R] Entering data into a multi-way array? Message-ID: <1317438749773-3862054.post@n4.nabble.com> Hello: I am a novice R user, but I have been working my way through the manuals / tutorials, ... I have R / Deducer up and running, and know the basics. I want to analyze a microarray (gene expression) dataset. I need to input the data into R as a multidimensional (multi-way) array, something on the order of 15,000 x 3 x 8 x 2 [genes x replicates x time points x treatments] I've Google'd, etc. to find the solution, but I cannot find an answer. I am being led to the idea (I suspect) that I may need to use a RDBMS (like MySQL) for that purpose (section 4 in http://cran.r-project.org/doc/manuals/R-data.html)? If so, that is my next challenge! Any replies much appreciated! Sincerely, Victoria :-) -- View this message in context: http://r.789695.n4.nabble.com/Entering-data-into-a-multi-way-array-tp3862054p3862054.html Sent from the R help mailing list archive at Nabble.com. From dana.sevak at yahoo.com Sat Oct 1 06:44:16 2011 From: dana.sevak at yahoo.com (Dana Sevak) Date: Fri, 30 Sep 2011 21:44:16 -0700 (PDT) Subject: [R] Help with cast/reshape Message-ID: <1317444256.16600.YahooMailNeo@web121819.mail.ne1.yahoo.com> I realize that this is terribly basic, but I just don't seem to see it at this moment, so I would very much appreciate your help. How shall I transform this dataframe: > df1 ? Name Index Value 1??? a???? 1?? 0.1 2??? a???? 2?? 0.2 3??? a???? 3?? 0.3 4??? a???? 4?? 0.4 5??? b???? 1?? 2.1 6??? b???? 2?? 2.2 7??? b???? 3?? 2.3 8??? b???? 4?? 2.4 into this dataframe: > df2 ??? Index? a?? ??? b 1? 1 ??? 0.1 ??? 2.1 2? 2 ??? 0.2 ??? 2.2 3? 3 ??? 0.3 ??? 2.3 4? 4 ??? 0.4 ??? 2.4 df1 = data.frame(c("a", "a", "a", "a", "b", "b", "b", "b"), c(1,2,3,4,1,2,3,4), c(0.1, 0.2, 0.3, 0.4, 2.1, 2.2, 2.3, 2.4)) colnames(df1) = c("Name", "Index", "Value") df2 = data.frame(c(1,2,3,4), c(0.1, 0.2, 0.3, 0.4), c(2.1, 2.2, 2.3, 2.4)) colnames(df2) = c("Index", "a", "b") Thank you very much. Dana From connerpharmd at yahoo.com Sat Oct 1 08:00:02 2011 From: connerpharmd at yahoo.com (Chris Conner) Date: Fri, 30 Sep 2011 23:00:02 -0700 (PDT) Subject: [R] Returning vector of values shared across 3 vectors? Message-ID: <1317448802.24157.YahooMailNeo@web160705.mail.bf1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Sat Oct 1 08:02:00 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Sat, 1 Oct 2011 02:02:00 -0400 Subject: [R] Entering data into a multi-way array? In-Reply-To: <1317438749773-3862054.post@n4.nabble.com> References: <1317438749773-3862054.post@n4.nabble.com> Message-ID: You might want to describe how your data is kept now so we can help with the loading process, but in general, 4D arrays are loaded with the same basic principles as 2D arrays (matrices). If you are new to this kind of work in R, it sounds like you might want to check out the Bioconductor lists for specifically microbiology/genetics questions. Michael Weylandt On Fri, Sep 30, 2011 at 11:12 PM, Victoria_Stuart <1 at victoriasjourney.com> wrote: > Hello: I am a novice R user, but I have been working my way through the > manuals / tutorials, ... ?I have R / Deducer up and running, and know the > basics. > > I want to analyze a microarray (gene expression) dataset. > > I need to input the data into R as a multidimensional (multi-way) array, > something on the order of > > 15,000 x 3 x 8 x 2 ?[genes x replicates x time points x treatments] > > I've Google'd, etc. to find the solution, but I cannot find an answer. ?I am > being led to the idea (I suspect) that I may need to use a RDBMS (like > MySQL) for that purpose (section 4 in > http://cran.r-project.org/doc/manuals/R-data.html)? > > If so, that is my next challenge! > > Any replies much appreciated! > > Sincerely, Victoria ?:-) > > -- > View this message in context: http://r.789695.n4.nabble.com/Entering-data-into-a-multi-way-array-tp3862054p3862054.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From michael.weylandt at gmail.com Sat Oct 1 08:09:41 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Sat, 1 Oct 2011 02:09:41 -0400 Subject: [R] Returning vector of values shared across 3 vectors? In-Reply-To: <1317448802.24157.YahooMailNeo@web160705.mail.bf1.yahoo.com> References: <1317448802.24157.YahooMailNeo@web160705.mail.bf1.yahoo.com> Message-ID: Try this: shared <- vec3[ (vec3 %in% vec1) & (vec3 %in% vec2)] Michael On Sat, Oct 1, 2011 at 2:00 AM, Chris Conner wrote: > Help-Rs, > > I've got three vectors representing participants: > > vec1 <- c(4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81) > vec2 <- c (1,2,3,4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66) > vec3 <- c (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,52) > > I'd like to return a vector that contains only the values that are shared across ALL THREE vectors. So the statement would return a vector that looked like this: > 4,5,6,7,8,9,10,11,12,52 > > For some reason I initially thought that a cbind and a unique() would handle it, but then common sense sunk in.? I think the sleep deprivation is starting to take it's toll.? I've got to believe that there is a simple solution to this dilema. > > Thanks in adance for any help! > C > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From michael.weylandt at gmail.com Sat Oct 1 08:33:16 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Sat, 1 Oct 2011 02:33:16 -0400 Subject: [R] error while using shapiro.test() In-Reply-To: <1317423562223-3861535.post@n4.nabble.com> References: <1317423562223-3861535.post@n4.nabble.com> Message-ID: Mr McHaggis, I'm not sure what " i'm having issues when it comes to reading my data in rows rather than columns " means, but here's some thoughts on your problem: ExData <- structure(list(Noun = c(21L, 26L, 31L, 37L, 22L, 12L, 22L, 12L, 27L, 20L, 21L, 29L, 16L, 23L, 29L, 33L, 29L, 24L, 28L, 32L, 19L, 19L, 19L, 29L, 34L, 27L, 13L, 17L, 21L, 28L), Verb = c(4L, 4L, 6L, 4L, 3L, 8L, 5L, 11L, 7L, 4L, 6L, 1L, 19L, 6L, 5L, 7L, 4L, 7L, 2L, 5L, 6L, 7L, 4L, 8L, 9L, 9L, 11L, 7L, 6L, 4L), adverb = c(9L, 5L, 6L, 6L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 3L, 1L, 3L, 0L, 1L, 5L, 0L, 0L, 0L), Adjective = c(1L, 3L, 0L, 0L, 10L, 9L, 5L, 12L, 10L, 10L, 6L, 8L, 8L, 5L, 2L, 3L, 2L, 7L, 3L, 7L, 5L, 4L, 5L, 4L, 1L, 3L, 9L, 10L, 7L, 5L), Preposition = c(1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Total = c(36L, 38L, 43L, 47L, 36L, 29L, 32L, 35L, 45L, 36L, 34L, 38L, 43L, 36L, 36L, 43L, 35L, 39L, 34L, 44L, 30L, 33L, 30L, 44L, 44L, 40L, 38L, 34L, 34L, 37L)), .Names = c("Noun", "Verb", "adverb", "Adjective", "Preposition", "Total"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30")) x1 <- data[1, 1:5] ## Why do you use c() here to combine a single line with itself? This actually converts your data frame to a list, which causes your problem shapiro.test(x1) ## Still will not work because data frames don't coerce to vectors, perhaps you meant to just extract the values, in which case you need to add shapiro.test(as.numeric(x1)) # but it seems more likely that you mean to get a column's worth of data: shapiro.test(ExData[1:5,1]) Does this help? Michael Weylandt If you have some basic programming background, take a look at chapter 2 of the R language definition to get a solid explanation of what all this talk of numeric/atomic/data frame/list is going on about. On Fri, Sep 30, 2011 at 6:59 PM, spicymchaggis101 wrote: > hey all, I'm just getting used to R and i'm having issues when it comes to > reading my data in rows rather than columns. any good advice would be much > appreciated ! > > here is the error: > >> data1 <- read.table(file.choose(),header=T) >> x1 <- c(data1[1,1:5]) >> shapiro.test(x1) > Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : > ?'x' must be atomic > > > This is my data: > > ? Noun Verb adverb Adjective Preposition Total > 1 ? ?21 ? ?4 ? ? ?9 ? ? ? ? 1 ? ? ? ? ? 1 ? ?36 > 2 ? ?26 ? ?4 ? ? ?5 ? ? ? ? 3 ? ? ? ? ? 0 ? ?38 > 3 ? ?31 ? ?6 ? ? ?6 ? ? ? ? 0 ? ? ? ? ? 0 ? ?43 > 4 ? ?37 ? ?4 ? ? ?6 ? ? ? ? 0 ? ? ? ? ? 0 ? ?47 > 5 ? ?22 ? ?3 ? ? ?1 ? ? ? ?10 ? ? ? ? ? 0 ? ?36 > 6 ? ?12 ? ?8 ? ? ?0 ? ? ? ? 9 ? ? ? ? ? 0 ? ?29 > 7 ? ?22 ? ?5 ? ? ?0 ? ? ? ? 5 ? ? ? ? ? 0 ? ?32 > 8 ? ?12 ? 11 ? ? ?0 ? ? ? ?12 ? ? ? ? ? 0 ? ?35 > 9 ? ?27 ? ?7 ? ? ?1 ? ? ? ?10 ? ? ? ? ? 0 ? ?45 > 10 ? 20 ? ?4 ? ? ?1 ? ? ? ?10 ? ? ? ? ? 1 ? ?36 > 11 ? 21 ? ?6 ? ? ?1 ? ? ? ? 6 ? ? ? ? ? 0 ? ?34 > 12 ? 29 ? ?1 ? ? ?0 ? ? ? ? 8 ? ? ? ? ? 0 ? ?38 > 13 ? 16 ? 19 ? ? ?0 ? ? ? ? 8 ? ? ? ? ? 0 ? ?43 > 14 ? 23 ? ?6 ? ? ?2 ? ? ? ? 5 ? ? ? ? ? 0 ? ?36 > 15 ? 29 ? ?5 ? ? ?0 ? ? ? ? 2 ? ? ? ? ? 0 ? ?36 > 16 ? 33 ? ?7 ? ? ?0 ? ? ? ? 3 ? ? ? ? ? 0 ? ?43 > 17 ? 29 ? ?4 ? ? ?0 ? ? ? ? 2 ? ? ? ? ? 0 ? ?35 > 18 ? 24 ? ?7 ? ? ?0 ? ? ? ? 7 ? ? ? ? ? 1 ? ?39 > 19 ? 28 ? ?2 ? ? ?1 ? ? ? ? 3 ? ? ? ? ? 0 ? ?34 > 20 ? 32 ? ?5 ? ? ?0 ? ? ? ? 7 ? ? ? ? ? 0 ? ?44 > 21 ? 19 ? ?6 ? ? ?0 ? ? ? ? 5 ? ? ? ? ? 0 ? ?30 > 22 ? 19 ? ?7 ? ? ?3 ? ? ? ? 4 ? ? ? ? ? 0 ? ?33 > 23 ? 19 ? ?4 ? ? ?1 ? ? ? ? 5 ? ? ? ? ? 1 ? ?30 > 24 ? 29 ? ?8 ? ? ?3 ? ? ? ? 4 ? ? ? ? ? 0 ? ?44 > 25 ? 34 ? ?9 ? ? ?0 ? ? ? ? 1 ? ? ? ? ? 0 ? ?44 > 26 ? 27 ? ?9 ? ? ?1 ? ? ? ? 3 ? ? ? ? ? 0 ? ?40 > 27 ? 13 ? 11 ? ? ?5 ? ? ? ? 9 ? ? ? ? ? 0 ? ?38 > 28 ? 17 ? ?7 ? ? ?0 ? ? ? ?10 ? ? ? ? ? 0 ? ?34 > 29 ? 21 ? ?6 ? ? ?0 ? ? ? ? 7 ? ? ? ? ? 0 ? ?34 > 30 ? 28 ? ?4 ? ? ?0 ? ? ? ? 5 ? ? ? ? ? 0 ? ?37 > > > > -- > View this message in context: http://r.789695.n4.nabble.com/error-while-using-shapiro-test-tp3861535p3861535.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From pittjj at uchicago.edu Sat Oct 1 07:14:45 2011 From: pittjj at uchicago.edu (pittjj at uchicago.edu) Date: Sat, 1 Oct 2011 00:14:45 -0500 Subject: [R] =?iso-8859-1?q?manual_R_en_espa=F1ol?= In-Reply-To: References: Message-ID: <20111001001445.AWV07418@mstore03.uchicago.edu> http://cran.r-project.org/doc/contrib/R-intro-1.1.0-espanol.1.pdf ? From xye78 at hotmail.com Sat Oct 1 08:12:23 2011 From: xye78 at hotmail.com (yehengxin) Date: Fri, 30 Sep 2011 23:12:23 -0700 (PDT) Subject: [R] Poor performance of "Optim" Message-ID: <1317449543774-3862229.post@n4.nabble.com> I used to consider using R and "Optim" to replace my commercial packages: Gauss and Matlab. But it turns out that "Optim" does not converge completely. The same data for Gauss and Matlab are converged very well. I see that there are too many packages based on "optim" and really doubt if they can be trusted! -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3862229.html Sent from the R help mailing list archive at Nabble.com. From rroa at azti.es Sat Oct 1 09:31:28 2011 From: rroa at azti.es (=?iso-8859-1?Q?Rub=E9n_Roa?=) Date: Sat, 1 Oct 2011 09:31:28 +0200 Subject: [R] Poor performance of "Optim" References: <1317449543774-3862229.post@n4.nabble.com> Message-ID: <5CD78996B8F8844D963C875D3159B94A01112174@DSRCORREO.azti.local> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Sat Oct 1 09:36:02 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sat, 1 Oct 2011 00:36:02 -0700 Subject: [R] Poor performance of "Optim" In-Reply-To: <1317449543774-3862229.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> Message-ID: Is there a question or point to your message or did you simply feel the urge to inform the entire R-help list of the things that you consider? Josh On Fri, Sep 30, 2011 at 11:12 PM, yehengxin wrote: > I used to consider using R and "Optim" to replace my commercial packages: > Gauss and Matlab. ?But it turns out that "Optim" does not converge > completely. ?The same data for Gauss and Matlab are converged very well. ?I > see that there are too many packages based on "optim" and really doubt if > they can be trusted! > > -- > View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3862229.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From simonfuller9 at gmail.com Sat Oct 1 10:22:09 2011 From: simonfuller9 at gmail.com (sf1979) Date: Sat, 1 Oct 2011 01:22:09 -0700 (PDT) Subject: [R] Covariance-Variance Matrix and For Loops In-Reply-To: References: <1317380432602-3859441.post@n4.nabble.com> <1317405259405-3860580.post@n4.nabble.com> Message-ID: <1317457329612-3862347.post@n4.nabble.com> Hello again, sapply works. However it does not explicitly call a simplify function, but rather seems to handle the case within its own body of code. I should be able to figure out basically what simplify2array does from the code though. function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) { FUN <- match.fun(FUN) answer <- lapply(X, FUN, ...) if (USE.NAMES && is.character(X) && is.null(names(answer))) names(answer) <- X if (simplify && length(answer) && length(common.len <- unique(unlist(lapply(answer, length)))) == 1L) { if (common.len == 1L) unlist(answer, recursive = FALSE) else if (common.len > 1L) { r <- as.vector(unlist(answer, recursive = FALSE)) if (prod(d <- c(common.len, length(X))) == length(r)) array(r, dim = d, dimnames = if (!(is.null(n1 <- names(answer[[1L]])) & is.null(n2 <- names(answer)))) list(n1, n2)) else answer } else answer } else answer } -- View this message in context: http://r.789695.n4.nabble.com/Covariance-Variance-Matrix-and-For-Loops-tp3859441p3862347.html Sent from the R help mailing list archive at Nabble.com. From daniel at umd.edu Sat Oct 1 11:04:03 2011 From: daniel at umd.edu (Daniel Malter) Date: Sat, 1 Oct 2011 02:04:03 -0700 (PDT) Subject: [R] Help with cast/reshape In-Reply-To: <1317444256.16600.YahooMailNeo@web121819.mail.ne1.yahoo.com> References: <1317444256.16600.YahooMailNeo@web121819.mail.ne1.yahoo.com> Message-ID: <1317459843368-3862404.post@n4.nabble.com> df2<-melt(df1) df3<-cast(df2,Index~Name) df3 HTH, Daniel Dana Sevak wrote: > > I realize that this is terribly basic, but I just don't seem to see it at > this moment, so I would very much appreciate your help. > > > How shall I transform this dataframe: > >> df1 > ? Name Index Value > 1??? a???? 1?? 0.1 > 2??? a???? 2?? 0.2 > 3??? a???? 3?? 0.3 > 4??? a???? 4?? 0.4 > 5??? b???? 1?? 2.1 > 6??? b???? 2?? 2.2 > 7??? b???? 3?? 2.3 > 8??? b???? 4?? 2.4 > > > into this dataframe: > >> df2 > ??? Index? a?? ??? b > 1? 1 ??? 0.1 ??? 2.1 > 2? 2 ??? 0.2 ??? 2.2 > 3? 3 ??? 0.3 ??? 2.3 > 4? 4 ??? 0.4 ??? 2.4 > > > df1 = data.frame(c("a", "a", "a", "a", "b", "b", "b", "b"), > c(1,2,3,4,1,2,3,4), c(0.1, 0.2, 0.3, 0.4, 2.1, 2.2, 2.3, 2.4)) > colnames(df1) = c("Name", "Index", "Value") > > df2 = data.frame(c(1,2,3,4), c(0.1, 0.2, 0.3, 0.4), c(2.1, 2.2, 2.3, 2.4)) > colnames(df2) = c("Index", "a", "b") > > > Thank you very much. > > Dana > > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/Help-with-cast-reshape-tp3862176p3862404.html Sent from the R help mailing list archive at Nabble.com. From caspervector at gmail.com Sat Oct 1 11:28:52 2011 From: caspervector at gmail.com (Casper Ti. Vector) Date: Sat, 1 Oct 2011 17:28:52 +0800 Subject: [R] Problem with logarithmic nonlinear model using nls() from the `stats' package Message-ID: <20111001092851.GA19097@CasperVector> Example: > f <- function(x) { 1 + 2 * log(1 + 3 * x) + rnorm(1, sd = 0.5) } > y <- f(x <- c(1 : 10)); y [1] 4.503841 5.623073 6.336423 6.861151 7.276430 7.620131 7.913338 8.169004 [9] 8.395662 8.599227 > nls(x ~ a + b * log(1 + c * x), start = list(a = 1, b = 2, c = 3), trace = TRUE) 37.22954 : 1 2 3 Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model In addition: Warning message: In log(1 + c * x) : NaNs produced What's wrong here? Am I handling this problem in the wrong way? Any suggestions are welcome, thanks :) -- Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134, valid from 2010 to 2013) from a key server. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: From jholtman at gmail.com Sat Oct 1 12:03:56 2011 From: jholtman at gmail.com (jim holtman) Date: Sat, 1 Oct 2011 06:03:56 -0400 Subject: [R] Returning vector of values shared across 3 vectors? In-Reply-To: <1317448802.24157.YahooMailNeo@web160705.mail.bf1.yahoo.com> References: <1317448802.24157.YahooMailNeo@web160705.mail.bf1.yahoo.com> Message-ID: try this: > vec1 <- c(4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81) > vec2 <- c (1,2,3,4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66) > vec3 <- c (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,52) > intersect(vec1,intersect(vec2, vec3)) [1] 4 5 6 7 8 9 10 11 12 52 > On Sat, Oct 1, 2011 at 2:00 AM, Chris Conner wrote: > Help-Rs, > > I've got three vectors representing participants: > > vec1 <- c(4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81) > vec2 <- c (1,2,3,4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66) > vec3 <- c (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,52) > > I'd like to return a vector that contains only the values that are shared across ALL THREE vectors. So the statement would return a vector that looked like this: > 4,5,6,7,8,9,10,11,12,52 > > For some reason I initially thought that a cbind and a unique() would handle it, but then common sense sunk in.? I think the sleep deprivation is starting to take it's toll.? I've got to believe that there is a simple solution to this dilema. > > Thanks in adance for any help! > C > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From marc_grt at yahoo.fr Sat Oct 1 12:04:57 2011 From: marc_grt at yahoo.fr (Marc Girondot) Date: Sat, 01 Oct 2011 12:04:57 +0200 Subject: [R] Poor performance of "Optim" In-Reply-To: <1317449543774-3862229.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> Message-ID: <4E86E5C9.5070309@yahoo.fr> Le 01/10/11 08:12, yehengxin a ?crit : > I used to consider using R and "Optim" to replace my commercial packages: > Gauss and Matlab. But it turns out that "Optim" does not converge > completely. What it means "completely" ? > The same data for Gauss and Matlab are converged very well. I > see that there are too many packages based on "optim" and really doubt if > they can be trusted! > > I don't understand the "too many". If a package needs an optimization, it is normal that it uses optim ! I use the same model in r, Excel solver (the new version is rather good) or Profit (a mac software, very powerful) and r is rather one of the best solution. But they are many different choices that can influence the optimization. You must give an example of the problem. I find some convergence problem when the criteria to be minimized is the result of a stochastic model (ie if the same set of parameters produce different objective value depending on the run). In this case the fit stops prematurely and the method SANN should be preferred. In conclusion, give us more information but take into account that non-linear optimization is a complex world ! Marc -- __________________________________________________________ Marc Girondot, Pr Laboratoire Ecologie, Syst?matique et Evolution Equipe de Conservation des Populations et des Communaut?s CNRS, AgroParisTech et Universit? Paris-Sud 11 , UMR 8079 B?timent 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1.69.15.73.53 e-mail: marc.girondot at u-psud.fr Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html Skype: girondot From spencer.graves at structuremonitoring.com Sat Oct 1 12:49:34 2011 From: spencer.graves at structuremonitoring.com (Spencer Graves) Date: Sat, 01 Oct 2011 03:49:34 -0700 Subject: [R] Poor performance of "Optim" In-Reply-To: <4E86E5C9.5070309@yahoo.fr> References: <1317449543774-3862229.post@n4.nabble.com> <4E86E5C9.5070309@yahoo.fr> Message-ID: <4E86F03E.7050702@structuremonitoring.com> Have you considered the "optimx" package? I haven't tried it, but it was produced by a team of leading researchers in nonlinear optimization, including those who wrote most of "optim" (http://user2010.org/tutorials/Nash.html) years ago. There is a team actively working on this. If you could provide specific examples where Gauss and Matlab outperformed the alternatives you've tried in R, especially if Gauss and Matlab outperformed optimx, I believe they would be interested. As previously noted, nonlinear optimization is a difficult problem. An overview of alternatives available in R, including optim and optimx, is available with the CRAN Task View on optimization (http://cran.fhcrc.org/web/views/Optimization.html). Hope this helps. Spencer On 10/1/2011 3:04 AM, Marc Girondot wrote: > Le 01/10/11 08:12, yehengxin a ?crit : >> I used to consider using R and "Optim" to replace my commercial >> packages: >> Gauss and Matlab. But it turns out that "Optim" does not converge >> completely. > What it means "completely" ? >> The same data for Gauss and Matlab are converged very well. I >> see that there are too many packages based on "optim" and really >> doubt if >> they can be trusted! >> >> > I don't understand the "too many". If a package needs an optimization, > it is normal that it uses optim ! > > I use the same model in r, Excel solver (the new version is rather > good) or Profit (a mac software, very powerful) and r is rather one of > the best solution. But they are many different choices that can > influence the optimization. You must give an example of the problem. > I find some convergence problem when the criteria to be minimized is > the result of a stochastic model (ie if the same set of parameters > produce different objective value depending on the run). In this case > the fit stops prematurely and the method SANN should be preferred. > In conclusion, give us more information but take into account that > non-linear optimization is a complex world ! > Marc -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com From antoine.paccard at unine.ch Sat Oct 1 12:26:11 2011 From: antoine.paccard at unine.ch (Antoine) Date: Sat, 1 Oct 2011 03:26:11 -0700 (PDT) Subject: [R] Adding axis to an ellipse: "ellipse" package In-Reply-To: <4E838670.8070504@xtra.co.nz> References: <1317138835047-3847954.post@n4.nabble.com> <4E82CCC6.3030904@xtra.co.nz> <9ED12467-7CFE-4B3F-9EFE-1AD186EE1EB4@unine.ch> <4E838670.8070504@xtra.co.nz> Message-ID: <1317464771788-3862491.post@n4.nabble.com> Dear Rolf, I tryed to follow your advices but the results I am getting seems still strange to me. See below an example of a matrix: datamat <- matrix(c(2.2, 0.4, 0.4, 2.8), 2, 2) plot(ellipse(datamat),type='l') eigenval <- eigen(datamat)$values eigenvect <- eigen(datamat)$vectors eigenscl <- eigenvect * sqrt(eigenval) * (qchisq(0.95,2))# One solution to get rescale v1 <- (eigenvect[,1])*(sqrt(eigenval[1]))*(qchisq(0.95,2))#or directly rescale the vectors needed v2 <- (eigenvect[,2])*(sqrt(eigenval[2]))*(qchisq(0.95,2)) #Or v1 <- eigenscl[1,] v2 <- eigenscl[2,] segments(-v1[1],-v1[2],v1[1],v1[2]) segments(-v2[1],-v2[2],v2[1],v2[2]) The vectors don't seem to be scaled properly and I don't see what I am doing wrong. Any ideas? Thanks! Antoine -- View this message in context: http://r.789695.n4.nabble.com/Adding-axis-to-an-ellipse-ellipse-package-tp3847954p3862491.html Sent from the R help mailing list archive at Nabble.com. From omphalodes.verna at yahoo.com Sat Oct 1 13:21:50 2011 From: omphalodes.verna at yahoo.com (Omphalodes Verna) Date: Sat, 1 Oct 2011 04:21:50 -0700 (PDT) Subject: [R] class definition Message-ID: <1317468110.78375.YahooMailNeo@web124916.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From hyu0401 at hotmail.com Sun Oct 2 03:27:45 2011 From: hyu0401 at hotmail.com (YuHong) Date: Sat, 1 Oct 2011 18:27:45 -0700 Subject: [R] Can I tell about someone's academic cheating In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rroa at azti.es Sat Oct 1 14:51:27 2011 From: rroa at azti.es (=?iso-8859-1?Q?Rub=E9n_Roa?=) Date: Sat, 1 Oct 2011 14:51:27 +0200 Subject: [R] Can I tell about someone's academic cheating References: Message-ID: <5CD78996B8F8844D963C875D3159B94A01112176@DSRCORREO.azti.local> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ggrothendieck at gmail.com Sat Oct 1 15:27:34 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Sat, 1 Oct 2011 09:27:34 -0400 Subject: [R] Problem with logarithmic nonlinear model using nls() from the `stats' package In-Reply-To: <20111001092851.GA19097@CasperVector> References: <20111001092851.GA19097@CasperVector> Message-ID: On Sat, Oct 1, 2011 at 5:28 AM, Casper Ti. Vector wrote: > Example: > >> f <- function(x) { 1 + 2 * log(1 + 3 * x) + rnorm(1, sd = 0.5) } >> y <- f(x <- c(1 : 10)); y > ?[1] 4.503841 5.623073 6.336423 6.861151 7.276430 7.620131 7.913338 8.169004 > ?[9] 8.395662 8.599227 >> nls(x ~ a + b * log(1 + c * x), start = list(a = 1, b = 2, c = 3), trace = TRUE) > 37.22954 : ?1 2 3 > Error in numericDeriv(form[[3L]], names(ind), env) : > ?Missing value or an infinity produced when evaluating the model > In addition: Warning message: > In log(1 + c * x) : NaNs produced > > What's wrong here? Am I handling this problem in the wrong way? > Any suggestions are welcome, thanks :) > Its linear given c so calculate the residual sum of squares using lm (or lm.fit which is faster) given c and optimize over c: set.seed(123) # for reproducibility # test data x <- 1:10 y <- 1 + 2 * log(1 + 3 * x) + rnorm(1, sd = 0.5) # calculate residual sum of squares for best fit given c fitc <- function(c) lm.fit(cbind(1, log(1 + c * x)), y) rssvals <- function(c) sum(resid(fitc(c))^2) out <- optimize(rssvals, c(0.01, 10)) which gives: > setNames(c(coef(fitc(out$minimum)), out$minimum), letters[1:3]) a b c 0.7197666 2.0000007 2.9999899 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From ggrothendieck at gmail.com Sat Oct 1 15:41:51 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Sat, 1 Oct 2011 09:41:51 -0400 Subject: [R] Problem with logarithmic nonlinear model using nls() from the `stats' package In-Reply-To: References: <20111001092851.GA19097@CasperVector> Message-ID: On Sat, Oct 1, 2011 at 9:27 AM, Gabor Grothendieck wrote: > On Sat, Oct 1, 2011 at 5:28 AM, Casper Ti. Vector > wrote: >> Example: >> >>> f <- function(x) { 1 + 2 * log(1 + 3 * x) + rnorm(1, sd = 0.5) } >>> y <- f(x <- c(1 : 10)); y >> ?[1] 4.503841 5.623073 6.336423 6.861151 7.276430 7.620131 7.913338 8.169004 >> ?[9] 8.395662 8.599227 >>> nls(x ~ a + b * log(1 + c * x), start = list(a = 1, b = 2, c = 3), trace = TRUE) >> 37.22954 : ?1 2 3 >> Error in numericDeriv(form[[3L]], names(ind), env) : >> ?Missing value or an infinity produced when evaluating the model >> In addition: Warning message: >> In log(1 + c * x) : NaNs produced >> >> What's wrong here? Am I handling this problem in the wrong way? >> Any suggestions are welcome, thanks :) >> > > Its linear given c so calculate the residual sum of squares using lm > (or lm.fit which is faster) given c and optimize over c: > > set.seed(123) # for reproducibility > > # test data > x <- 1:10 > y <- 1 + 2 * log(1 + 3 * x) + rnorm(1, sd = 0.5) > > # calculate residual sum of squares for best fit given c > fitc <- function(c) lm.fit(cbind(1, log(1 + c * x)), y) > rssvals <- function(c) sum(resid(fitc(c))^2) > > out <- optimize(rssvals, c(0.01, 10)) > > which gives: > >> setNames(c(coef(fitc(out$minimum)), out$minimum), letters[1:3]) > ? ? ? ?a ? ? ? ? b ? ? ? ? c > 0.7197666 2.0000007 2.9999899 Also you probably intended to write 10 instead of 1 as the arg to rnorm. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From ligges at statistik.tu-dortmund.de Sat Oct 1 15:45:23 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 01 Oct 2011 15:45:23 +0200 Subject: [R] Odd gridding pattern when plotting In-Reply-To: <359118272EA85C409A0B9BEA0B2E11DE14F3789501@its-hcwnem04.ds.Vanderbilt.edu> References: <359118272EA85C409A0B9BEA0B2E11DE14F37894F3@its-hcwnem04.ds.Vanderbilt.edu> <359118272EA85C409A0B9BEA0B2E11DE14F37894FA@its-hcwnem04.ds.Vanderbilt.edu> <00a901cc7fa2$5a3983d0$0eac8b70$@edu> <359118272EA85C409A0B9BEA0B2E11DE14F3789501@its-hcwnem04.ds.Vanderbilt.edu> Message-ID: <4E871973.2090605@statistik.tu-dortmund.de> I think you found a bug introduced in R-2.13.x that has been fixed in R-2.13.2 which has been released yesterday. Best, Uwe Ligges On 30.09.2011 21:36, Balko, Justin wrote: > Thanks, that kind of helps. However, some of my previous code uses functions like heatmap.2 which has multiple images (legend/color key) as well as the actual heatmap. Employing useRaster=TRUE here only applies to the heatmap and not the legend. Not a huge deal. Is there anyway to set an option in R to always use rastering when drawing in the interface? > Thanks again, > Justin > > -----Original Message----- > From: David L Carlson [mailto:dcarlson at tamu.edu] > Sent: Friday, September 30, 2011 1:54 PM > To: Balko, Justin; r-help at r-project.org > Subject: RE: [R] Odd gridding pattern when plotting > >> From ?image > > " Images for large z on a regular grid are more efficient with useRaster enabled and can prevent rare anti-aliasing artifacts, but may not be supported by all graphics devices." > > Adding useRaster=TRUE to the two image() calls gets rid of the white grid lines. > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Balko, Justin > Sent: Friday, September 30, 2011 10:43 AM > To: r-help at r-project.org > Subject: [R] Odd gridding pattern when plotting > > > Hi, I'm no longer on the subscribing list, but was hoping to get my question posted. Please inform if this is ok, although I am guessing you wont post with the image below. If so, let me know and I will resend without the image. > Thanks > > > Hi, > I just upgraded my system and my version of R all at once. Upon running old code for heatmaps etc, I suddenly notice that there is an odd grid pattern appearing in all of my plots. An example is below: > > #example from ?image > > require(grDevices) # for colours > x<- y<- seq(-4*pi, 4*pi, len=27) > r<- sqrt(outer(x^2, y^2, "+")) > image(z = z<- cos(r^2)*exp(-r/6), col=gray((0:32)/32)) image(z, axes = FALSE, main = "Math can be beautiful ...", > xlab = expression(cos(r^2) * e^{-r/6})) contour(z, add = TRUE, drawlabels = FALSE) > > > > Any ideas what is causing this? I can't seem to figure it out. I'm not sure the bmp image can/will be posted, so maybe you can just take my word for it. It is a gridding pattern in white, that appears over the plot area only. Vertical lines are every 4 units, evenly spaced. Horizontal lines appear at every unit, then stop for a while (6-7 units, then appear every unit for 4-5 units). Simple plots like plot(x,y) do not seem to produce it, or at least I can't see it. Any ideas are helpful. > Thanks! > > > Justin M. Balko, Pharm.D., Ph.D. > Research Fellow, Arteaga Lab > Department of Medicine > Division of Hematology/Oncology > Vanderbilt University > 777 Preston Research Building > Nashville TN, 37232-6307 > Ph: 615-936-1495 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From caspervector at gmail.com Sat Oct 1 15:55:38 2011 From: caspervector at gmail.com (Casper Ti. Vector) Date: Sat, 1 Oct 2011 21:55:38 +0800 Subject: [R] Problem with logarithmic nonlinear model using nls() from the `stats' package In-Reply-To: References: <20111001092851.GA19097@CasperVector> Message-ID: <20111001135538.GA15586@CasperVector> Ah, now I see... Thanks very much :) On Sat, Oct 01, 2011 at 09:27:34AM -0400, Gabor Grothendieck wrote: > On Sat, Oct 1, 2011 at 5:28 AM, Casper Ti. Vector > wrote: > Its linear given c so calculate the residual sum of squares using lm > (or lm.fit which is faster) given c and optimize over c: > > set.seed(123) # for reproducibility > x <- 1:10 > y <- 1 + 2 * log(1 + 3 * x) + rnorm(1, sd = 0.5) > fitc <- function(c) lm.fit(cbind(1, log(1 + c * x)), y) > rssvals <- function(c) sum(resid(fitc(c))^2) > out <- optimize(rssvals, c(0.01, 10)) > > which gives: > 0.7197666 2.0000007 2.9999899 -- Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134, valid from 2010 to 2013) from a key server. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: From ligges at statistik.tu-dortmund.de Sat Oct 1 16:09:48 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 01 Oct 2011 16:09:48 +0200 Subject: [R] class definition In-Reply-To: <1317468110.78375.YahooMailNeo@web124916.mail.ne1.yahoo.com> References: <1317468110.78375.YahooMailNeo@web124916.mail.ne1.yahoo.com> Message-ID: <4E871F2C.8020306@statistik.tu-dortmund.de> On 01.10.2011 13:21, Omphalodes Verna wrote: > Hi everybody! > > I have a matrix of class "myClass", for example: > > myMat<- matrix(rnorm(30), nrow = 6) > attr(myMat, "class")<- "myClass" > class(myMat) > > When I extract part of ''myMat'', the corresponding class ''myClass'' unfortunately disappear: > > myMat.p<- myMat[,1:2] > class(myMat.p) > > Please for any advice / suggestions, how define class, that during an operation does not disappear. You will need a "[" method for your class. Uwe Ligges > > Thanks, OV > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Sat Oct 1 16:17:53 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Sat, 1 Oct 2011 10:17:53 -0400 Subject: [R] Covariance-Variance Matrix and For Loops In-Reply-To: <1317457329612-3862347.post@n4.nabble.com> References: <1317380432602-3859441.post@n4.nabble.com> <1317405259405-3860580.post@n4.nabble.com> <1317457329612-3862347.post@n4.nabble.com> Message-ID: Surprising: must be newer update than I realized....anyways, here's the code if you want to add it manually: simplify2array <- function (x, higher = TRUE) { if (length(common.len <- unique(unlist(lapply(x, length)))) > 1L) return(x) if (common.len == 1L) unlist(x, recursive = FALSE) else if (common.len > 1L) { n <- length(x) r <- as.vector(unlist(x, recursive = FALSE)) if (higher && length(c.dim <- unique(lapply(x, dim))) == 1 && is.numeric(c.dim <- c.dim[[1L]]) && prod(d <- c(c.dim, n)) == length(r)) { iN1 <- is.null(n1 <- dimnames(x[[1L]])) n2 <- names(x) dnam <- if (!(iN1 && is.null(n2))) c(if (iN1) rep.int(list(n1), length(c.dim)) else n1, list(n2)) array(r, dim = d, dimnames = dnam) } else if (prod(d <- c(common.len, n)) == length(r)) array(r, dim = d, dimnames = if (!(is.null(n1 <- names(x[[1L]])) & is.null(n2 <- names(x)))) list(n1, n2)) else x } else x } On Sat, Oct 1, 2011 at 4:22 AM, sf1979 wrote: > Hello again, > > sapply works. > > However it does not explicitly call a simplify function, but rather seems to > handle the case within its own body of code. I should be able to figure out > basically what simplify2array does from the code though. > > function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) > { > ? ?FUN <- match.fun(FUN) > ? ?answer <- lapply(X, FUN, ...) > ? ?if (USE.NAMES && is.character(X) && is.null(names(answer))) > ? ? ? ?names(answer) <- X > ? ?if (simplify && length(answer) && length(common.len <- > unique(unlist(lapply(answer, > ? ? ? ?length)))) == 1L) { > ? ? ? ?if (common.len == 1L) > ? ? ? ? ? ?unlist(answer, recursive = FALSE) > ? ? ? ?else if (common.len > 1L) { > ? ? ? ? ? ?r <- as.vector(unlist(answer, recursive = FALSE)) > ? ? ? ? ? ?if (prod(d <- c(common.len, length(X))) == length(r)) > ? ? ? ? ? ? ? ?array(r, dim = d, dimnames = if (!(is.null(n1 <- > names(answer[[1L]])) & > ? ? ? ? ? ? ? ? ?is.null(n2 <- names(answer)))) > ? ? ? ? ? ? ? ? ?list(n1, n2)) > ? ? ? ? ? ?else answer > ? ? ? ?} > ? ? ? ?else answer > ? ?} > ? ?else answer > } > > > -- > View this message in context: http://r.789695.n4.nabble.com/Covariance-Variance-Matrix-and-For-Loops-tp3859441p3862347.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From dwinsemius at comcast.net Sat Oct 1 17:25:17 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sat, 1 Oct 2011 11:25:17 -0400 Subject: [R] Is the output of survfit.coxph survival or baseline survival? In-Reply-To: <1317432668597-3861919.post@n4.nabble.com> References: <1317432668597-3861919.post@n4.nabble.com> Message-ID: On Sep 30, 2011, at 9:31 PM, koshihaku wrote: > Dear all, > I am confused with the output of survfit.coxph. > Someone said that the survival given by summary(survfit.coxph) is the > baseline survival S_0, but some said that is the survival > S=S_0^exp{beta*x}. > > Which one is correct? It may depend on who _some_ and _someone_ mean by S_0 and who they are. I have in the past posted erroneous answers, but the name on which to search the archives is 'Terry Therneau'. My current understanding is that the survival S_0 is the estimated survival for a hypothetical subject whose continuous and discrete covariates are all at their means. (But I have been wrong before.) Here is some of what Therneau has said about it: http://finzi.psych.upenn.edu/Rhelp10/2010-October/257941.html http://finzi.psych.upenn.edu/Rhelp10/2009-March/190341.html http://finzi.psych.upenn.edu/Rhelp10/2009-February/189768.html > > By the way, if I use "newdata=" in the survfit, does that mean the > survival > is estimated by the value of covariates in the new data frame? In one sense yes, but in another sense, no. If you have a cox fit and you supply newdata, the beta estimates and the baseline survival come from in the original data. If you just give it a formula, then there is no newdata argument, only a data argument. Try this: fit <- coxph( Surv(futime, fustat)~rx, data=ovarian) plot( survfit(fit, newdata=data.frame(rx=1) ) ) plot( survfit( Surv(futime, fustat)~rx, data=ovarian) ) Then flipping back and forth between those curves might clarify, at least to the extent that I understand this question. And here's a pathological extrapolation: plot(survfit(fit, newdata=data.frame(rx=1:3))) # There is no rx=3 in the original data but it wasn't defined as a factor when given to coxph. # Just checked to see if you could extrapolate past the end of a range of factors and very sensibly you cannot. > fit <- coxph( Surv(futime, fustat)~factor(rx), data=ovarian) > plot(survfit(fit, newdata=data.frame(rx=1:3))) Error in model.frame.default(data = data.frame(rx = 1:3), formula = ~factor(rx), : factor 'factor(rx)' has new level(s) 3 -- David. From lea_olsen at yahoo.ca Sat Oct 1 16:40:14 2011 From: lea_olsen at yahoo.ca (Spartina) Date: Sat, 1 Oct 2011 07:40:14 -0700 (PDT) Subject: [R] Nearest neighbour in a matrix In-Reply-To: <1317081392784-3845747.post@n4.nabble.com> References: <1317081392784-3845747.post@n4.nabble.com> Message-ID: <1317480014024-3862973.post@n4.nabble.com> Hi, sorry for the late reply. I just wanted to thank both of you for your answers. They were helpful and also thank you for mentioning the website that has the tutorials which is a most helpful resource. Cheers, L?a -- View this message in context: http://r.789695.n4.nabble.com/Nearest-neighbour-in-a-matrix-tp3845747p3862973.html Sent from the R help mailing list archive at Nabble.com. From aidan.corcoran11 at gmail.com Sat Oct 1 16:58:03 2011 From: aidan.corcoran11 at gmail.com (Aidan Corcoran) Date: Sat, 1 Oct 2011 15:58:03 +0100 Subject: [R] error using ddply to generate means Message-ID: Dear list, I encounter an error when I try to use ddply to generate means as follows: fun3<-structure(list(sector = structure(list(gics_sector_name = c("Financials", "Financials", "Materials", "Materials")), .Names = "gics_sector_name", row.names = structure(c("UBSN VX Equity", "LLOY LN Equity", "AI FP Equity", "AKE FP Equity"), .Dim = 4L), class = "data.frame"), bebitpcchg = c(-0.567449058550428, 0.99600643852127, NA, -42.7587478692081), ticker = c("UBSN VX Equity", "LLOY LN Equity", "AI FP Equity", "AKE FP Equity")), .Names = c("sector", "bebitpcchg", "ticker"), row.names = c(12L, 24L, 36L, 48L), class = "data.frame") fun3 gics_sector_name bebitpcchg ticker 12 Financials -0.5674491 UBSN VX Equity 24 Financials 0.9960064 LLOY LN Equity 36 Materials NA AI FP Equity 48 Materials -42.7587479 AKE FP Equity fun4<-ddply(fun3,c("sector"),summarise,avgbebitchg=mean(bebitpcchg,na.rm=TRUE)) Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) : undefined columns selected This is a small sample of my data. I?m probably overlooking some problem in my syntax, but would be very grateful if someone could point it out. Thanks in advance, Aidan. sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Ireland.1252 LC_CTYPE=English_Ireland.1252 LC_MONETARY=English_Ireland.1252 [4] LC_NUMERIC=C LC_TIME=English_Ireland.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] plm_1.2-7 sandwich_2.2-7 MASS_7.3-12 Formula_1.0-1 nlme_3.1-100 [6] bdsmatrix_1.0 RBloomberg_0.4-149 rJava_0.8-8 gtools_2.6.2 gdata_2.8.2 [11] ggplot2_0.8.9 proto_0.3-9.2 zoo_1.7-4 reshape_0.8.4 plyr_1.6 loaded via a namespace (and not attached): [1] lattice_0.19-23 tools_2.13.0 From djmuser at gmail.com Sat Oct 1 18:21:14 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sat, 1 Oct 2011 09:21:14 -0700 Subject: [R] error using ddply to generate means In-Reply-To: References: Message-ID: Hi: Here's the problem: > str(fun3) 'data.frame': 4 obs. of 3 variables: $ sector :'data.frame': 4 obs. of 1 variable: ..$ gics_sector_name: chr "Financials" "Financials" "Materials" "Materials" $ bebitpcchg: num -0.567 0.996 NA -42.759 $ ticker : chr "UBSN VX Equity" "LLOY LN Equity" "AI FP Equity" "AKE FP Equity" Notice that fun3$sector is a data frame, not a variable. By leaving fun3 intact, the summary is gotten with ddply(fun3, .(sector$gics_sector_name), summarise, avgbebitchg=mean(bebitpcchg,na.rm=TRUE)) sector$gics_sector_name avgbebitchg 1 Financials 0.2142787 2 Materials -42.7587479 You might consider reframing fun3, pardon the pun. HTH, Dennis On Sat, Oct 1, 2011 at 7:58 AM, Aidan Corcoran wrote: > Dear list, > > I encounter an error when I try to use ddply to generate means as follows: > > fun3<-structure(list(sector = structure(list(gics_sector_name = c("Financials", > "Financials", "Materials", "Materials")), .Names = "gics_sector_name", > row.names = structure(c("UBSN VX Equity", > "LLOY LN Equity", "AI FP Equity", "AKE FP Equity"), .Dim = 4L), class > = "data.frame"), > ? bebitpcchg = c(-0.567449058550428, 0.99600643852127, NA, > ? -42.7587478692081), ticker = c("UBSN VX Equity", "LLOY LN Equity", > ? "AI FP Equity", "AKE FP Equity")), .Names = c("sector", "bebitpcchg", > "ticker"), row.names = c(12L, 24L, 36L, 48L), class = "data.frame") > > fun3 > > ?gics_sector_name ?bebitpcchg ? ? ? ? ticker > 12 ? ? ? Financials ?-0.5674491 UBSN VX Equity > 24 ? ? ? Financials ? 0.9960064 LLOY LN Equity > 36 ? ? ? ?Materials ? ? ? ? ?NA ? AI FP Equity > 48 ? ? ? ?Materials -42.7587479 ?AKE FP Equity > > > fun4<-ddply(fun3,c("sector"),summarise,avgbebitchg=mean(bebitpcchg,na.rm=TRUE)) > > Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = > decreasing)) : > ?undefined columns selected > > This is a small sample of my data. I?m probably overlooking some > problem in my syntax, but would be very grateful if someone could > point it out. > > Thanks in advance, > Aidan. > > sessionInfo() > > R version 2.13.0 (2011-04-13) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_Ireland.1252 ?LC_CTYPE=English_Ireland.1252 > LC_MONETARY=English_Ireland.1252 > [4] LC_NUMERIC=C ? ? ? ? ? ? ? ? ? ? LC_TIME=English_Ireland.1252 > > attached base packages: > [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets > methods ? base > > other attached packages: > ?[1] plm_1.2-7 ? ? ? ? ?sandwich_2.2-7 ? ? MASS_7.3-12 > Formula_1.0-1 ? ? ?nlme_3.1-100 > ?[6] bdsmatrix_1.0 ? ? ?RBloomberg_0.4-149 rJava_0.8-8 > gtools_2.6.2 ? ? ? gdata_2.8.2 > [11] ggplot2_0.8.9 ? ? ?proto_0.3-9.2 ? ? ?zoo_1.7-4 > reshape_0.8.4 ? ? ?plyr_1.6 > > loaded via a namespace (and not attached): > [1] lattice_0.19-23 tools_2.13.0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From simonfuller9 at gmail.com Sat Oct 1 17:32:29 2011 From: simonfuller9 at gmail.com (sf1979) Date: Sat, 1 Oct 2011 08:32:29 -0700 (PDT) Subject: [R] Covariance-Variance Matrix and For Loops In-Reply-To: References: <1317380432602-3859441.post@n4.nabble.com> <1317405259405-3860580.post@n4.nabble.com> <1317457329612-3862347.post@n4.nabble.com> Message-ID: <1317483149021-3863098.post@n4.nabble.com> That's very helpful Michael, thank you. I will add it to the arsenal. -- View this message in context: http://r.789695.n4.nabble.com/Covariance-Variance-Matrix-and-For-Loops-tp3859441p3863098.html Sent from the R help mailing list archive at Nabble.com. From jami5490 at mylaurier.ca Sat Oct 1 18:24:53 2011 From: jami5490 at mylaurier.ca (spicymchaggis101) Date: Sat, 1 Oct 2011 09:24:53 -0700 (PDT) Subject: [R] error while using shapiro.test() In-Reply-To: References: <1317423562223-3861535.post@n4.nabble.com> Message-ID: <1317486293251-3863205.post@n4.nabble.com> Thank you very much! your response solved my issue. I needed to determine the probability of normality for word types per page. -- View this message in context: http://r.789695.n4.nabble.com/error-while-using-shapiro-test-tp3861535p3863205.html Sent from the R help mailing list archive at Nabble.com. From sandeep.coepcivil at gmail.com Sat Oct 1 19:03:58 2011 From: sandeep.coepcivil at gmail.com (Sandeep Patil) Date: Sat, 1 Oct 2011 12:03:58 -0500 Subject: [R] Gstat - Installation Fail _ download source and compile help ... In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mentor_ at gmx.net Sat Oct 1 20:34:47 2011 From: mentor_ at gmx.net (syrvn) Date: Sat, 1 Oct 2011 11:34:47 -0700 (PDT) Subject: [R] Create web applications to run R scripts Message-ID: <1317494087210-3863457.post@n4.nabble.com> Hello, is there anything similar to the Rwui package to create web applications to run R scripts? Many thanks, syrvn -- View this message in context: http://r.789695.n4.nabble.com/Create-web-applications-to-run-R-scripts-tp3863457p3863457.html Sent from the R help mailing list archive at Nabble.com. From edd at debian.org Sat Oct 1 20:43:06 2011 From: edd at debian.org (Dirk Eddelbuettel) Date: Sat, 1 Oct 2011 18:43:06 +0000 Subject: [R] Create web applications to run R scripts In-Reply-To: <1317494087210-3863457.post@n4.nabble.com> References: <1317494087210-3863457.post@n4.nabble.com> Message-ID: <20111001184306.GA26143@master.debian.org> On Sat, Oct 01, 2011 at 11:34:47AM -0700, syrvn wrote: > Hello, > > is there anything similar to the Rwui package to create web applications to > run R scripts? There is an entire section of the R FAQ devoted to this. Dirk -- Three out of two people have difficulties with fractions. From juliet.hannah at gmail.com Sat Oct 1 22:05:40 2011 From: juliet.hannah at gmail.com (Juliet Hannah) Date: Sat, 1 Oct 2011 16:05:40 -0400 Subject: [R] Printing an xtable with type = html In-Reply-To: <4E82F51B.3080601@auckland.ac.nz> References: <4E82F51B.3080601@auckland.ac.nz> Message-ID: Maybe some of the comments in this post may be informative to you: http://r.789695.n4.nabble.com/improve-formatting-of-HTML-table-td3736299.html On Wed, Sep 28, 2011 at 6:21 AM, David Scott wrote: > > I have been playing around with producing tables using xtable and the type = > "html" argument when printing. For example, if xtbl is the output of a > dataframe which has been run through xtable, using the command: > > print(xtbl, type = "html", > ? ? ?html.table.attributes = "border = '1', align = 'center'") > > I would be interested to see other examples of the use of xtable to produce > html. There is a whole vignette on using xtable to produce all sorts of > tables for incorporation into a TeX document but I have found no examples of > producing html with any table attributes. > > Ideally xtable should be able to access a css file but I don't see any > mechanism for doing that. Perhaps someone can enlighten me. > > David Scott > > -- > _________________________________________________________________ > David Scott ? ? Department of Statistics > ? ? ? ? ? ? ? ?The University of Auckland, PB 92019 > ? ? ? ? ? ? ? ?Auckland 1142, ? ?NEW ZEALAND > Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 > Email: ?d.scott at auckland.ac.nz, ?Fax: +64 9 373 7018 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From friendly at yorku.ca Sat Oct 1 23:09:44 2011 From: friendly at yorku.ca (Michael Friendly) Date: Sat, 01 Oct 2011 17:09:44 -0400 Subject: [R] Understanding the workflow between sweave, R and Latex In-Reply-To: <1317388132555-3859762.post@n4.nabble.com> References: <1317384235559-3859612.post@n4.nabble.com> <4E85B8E7.3040506@gmail.com> <1317388132555-3859762.post@n4.nabble.com> Message-ID: <4E878198.9070301@yorku.ca> On 9/30/2011 9:08 AM, syrvn wrote: > Hi Duncan, > > I use Eclipse and StatET plus TexClipse and Sweave which comes with the > StatET package. > So fore me it is basically one click as well to produce the pdf from the > .Rnw file. > > I installed the MacTex live 2011 version on my computer and thought it might > actually be > easy to find out how and where latex searches for packages. But I did not > find the place > where all this is coded... > First, since this is Mac-related, you would probably get better answers on the R-sig-mac list. Second, most latex distributions support both a system 'texmf' tree and one or more local/user texmf trees, that you can configure with something like Preferences somewhere in MacTex. On my linux system, I use ~/texmf/ and simply copied Sweave.sty to ~/texmf/tex/latex/misc/Sweave.sty (if my path-memory serves) No more worries (unless Sweave.sty is changed in a new R distro) Finally, it does help to RTFM, where you can find other options under ?RweaveLatex in the Details section. The LaTeX file generated needs to contain the line \usepackage{Sweave}, and if this is not present in the Sweave source file (possibly in a comment), it is inserted by the RweaveLatex driver. If stylepath = TRUE, a hard-coded path to the file ?Sweave.sty? in the R installation is set in place of Sweave. The hard-coded path makes the LaTeX file less portable, but avoids the problem of installing the current version of ?Sweave.sty? to some place in your TeX input path. However, TeX may not be able to process the hard-coded path if it contains spaces (as it often will under Windows) or TeX special characters. The default for stylepath is now taken from the environment variable SWEAVE_STYLEPATH_DEFAULT, or is FALSE it that is unset or empty. If set, it should be exactly TRUE or FALSE: any other values are taken as FALSE. As from R 2.12.0, the simplest way for frequent Sweave users to ensure that ?Sweave.sty? is in the TeX input path is to add ?R_HOME/share/texmf? as a ?texmf tree? (?root directory? in the parlance of the ?MiKTeX settings? utility). By default, ?Sweave.sty? sets the width of all included graphics to: \setkeys{Gin}{width=0.8\textwidth}. From jim.silverton at gmail.com Sun Oct 2 03:06:08 2011 From: jim.silverton at gmail.com (Jim Silverton) Date: Sat, 1 Oct 2011 21:06:08 -0400 Subject: [R] Sum of Probabilities in a matrix... Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From daniel at umd.edu Sun Oct 2 03:17:30 2011 From: daniel at umd.edu (Daniel Malter) Date: Sat, 1 Oct 2011 18:17:30 -0700 (PDT) Subject: [R] Poor performance of "Optim" In-Reply-To: <1317511253729-3863969.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> <1317511253729-3863969.post@n4.nabble.com> Message-ID: <1317518250373-3864133.post@n4.nabble.com> With respect, your statement that R's optim does not give you a reliable estimator is bogus. As pointed out before, this would depend on when optim believes it's good enough and stops optimizing. In particular if you stretch out x, then it is plausible that the likelihood function will become flat enough "earlier," so that the numerical optimization will stop earlier (i.e., optim will "think" that the slope of the likelihood function is flat enough to be considered zero and stop earlier than it will for more condensed data). After all, maximum likelihood is a numerical method and thus an approximation. I would venture to say that what you describe lies in the nature of this method. You could also follow the good advice given earlier, by increasing the number of iterations or decreasing the tolerance. However, check the example below: for all purposes it's really close enough and has nothing to do with optim being "unreliable." n<-1000 x<-rnorm(n) y<-0.5*x+rnorm(n) z<-ifelse(y>0,1,0) X<-cbind(1,x) b<-matrix(c(0,0),nrow=2) #Probit reg<-glm(z~x,family=binomial("probit")) #Optim reproducing probit (with minor deviations due to difference in method) LL<-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))} optim(c(0,0),LL) #Multiply x by 2 and repeat optim X[,2]=2*X[,2] optim(c(0,0),LL) HTH, Daniel yehengxin wrote: > > What I tried is just a simple binary probit model. Create a random data > and use "optim" to maximize the log-likelihood function to estimate the > coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And y = (u > > 0), y indicating a binary choice variable) > > If I estimate coefficient of "x", I should be able to get a value close to > 0.2 if sample is large enough. Say I got 0.18. > > If I expand x by twice and reestimate the model, which coefficient should > I get? 0.09, right? > > But with "optim", I got something different. When I do the same thing in > both Gauss and Matlab, I can exactly get 0.09, evidencing that the > coefficient estimator is reliable. But R's "optim" does not give me a > reliable estimator. > -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864133.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Sun Oct 2 03:30:57 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sat, 1 Oct 2011 18:30:57 -0700 Subject: [R] Sum of Probabilities in a matrix... In-Reply-To: References: Message-ID: Let's make it a data frame instead: # Read the data from your post into a data frame named d: d <- read.table(textConnection(" 0.98 2 0.2 1 0.01 2 0.5 1 0.6 6")) closeAllConnections() # Use the ave() function and append the result to d: d$sumprob <- with(d, ave(V1, V2, FUN = sum)) > d V1 V2 sumprob 1 0.98 2 0.99 2 0.20 1 0.70 3 0.01 2 0.99 4 0.50 1 0.70 5 0.60 6 0.60 HTH, Dennis On Sat, Oct 1, 2011 at 6:06 PM, Jim Silverton wrote: > Hi all, > I have 2 columns in a mtrix, one of which is a column of probabilities and > the other is simply a vector of integers. I want to sum all the > probabilities with the same integer value and put it in a new column. > For example, > If my matrix is: > > 0.98 ? 2 > 0.2 ? ? 1 > 0.01 ? 2 > 0.5 ? ? 1 > 0.6 ? ? 6 > > > Then I should get: > 0.98 ? 2 ? ?0.99 > 0.2 ? ? 1 ? ?0.70 > 0.01 ? 2 ? ?0.99 > 0.5 ? ? 1 ? ?0.70 > 0.6 ? ? 6 ? ?0.60 > > Any help is greatly appreciated. > > > > -- > Thanks, > Jim. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From erinm.hodgess at gmail.com Sun Oct 2 04:42:23 2011 From: erinm.hodgess at gmail.com (Erin Hodgess) Date: Sat, 1 Oct 2011 21:42:23 -0500 Subject: [R] R Studio and Rcmdr/RcmdrPlugins Message-ID: Dear R People: Hope you're having a great weekend! Anyhow, I'm currently experimenting with R Studio on a web server, which is the best thing since sliced bread, Coca Cola, etc. My one question: there is a way to show plots. is there a way to show Rcmdr or its Plugins, please? I tried, but it doesn't seem to work. Thanks so much, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com From rolf.turner at xtra.co.nz Sun Oct 2 04:42:51 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Sun, 02 Oct 2011 15:42:51 +1300 Subject: [R] Adding axis to an ellipse: "ellipse" package In-Reply-To: <1317464771788-3862491.post@n4.nabble.com> References: <1317138835047-3847954.post@n4.nabble.com> <4E82CCC6.3030904@xtra.co.nz> <9ED12467-7CFE-4B3F-9EFE-1AD186EE1EB4@unine.ch> <4E838670.8070504@xtra.co.nz> <1317464771788-3862491.post@n4.nabble.com> Message-ID: <4E87CFAB.3030406@xtra.co.nz> See comments in-line: On 01/10/11 23:26, Antoine wrote: > Dear Rolf, > > I tryed to follow your advices but the results I am getting seems still > strange to me. See below an example of a matrix: > > datamat<- matrix(c(2.2, 0.4, 0.4, 2.8), 2, 2) > plot(ellipse(datamat),type='l') > eigenval<- eigen(datamat)$values > eigenvect<- eigen(datamat)$vectors > eigenscl<- eigenvect * sqrt(eigenval) * (qchisq(0.95,2))# One solution to > get rescale This is wrong because you are multiplying the i-th row of ``eigenvect'' the square root of the i-th eigenvalue. The *columns* of ``eigenvect'' are the eigenvectors. So you need to multiply the j-th column by the square root of the j-th eigenvalue. > v1<- (eigenvect[,1])*(sqrt(eigenval[1]))*(qchisq(0.95,2))#or directly > rescale the vectors needed > v2<- (eigenvect[,2])*(sqrt(eigenval[2]))*(qchisq(0.95,2)) The foregoing is correct except that you need to take the square root of the chi-squared quantile. > #Or > v1<- eigenscl[1,] > v2<- eigenscl[2,] > segments(-v1[1],-v1[2],v1[1],v1[2]) > segments(-v2[1],-v2[2],v2[1],v2[2]) > > The vectors don't seem to be scaled properly and I don't see what I am doing > wrong. Any ideas? Here is correct code: require(ellipse) S <- matrix(c(2.2, 0.4, 0.4, 2.8), 2, 2) # Note the ``asp=1'' which makes orthogonal lines # look orthogonal: plot(ellipse(S),type='l',asp=1) E <- eigen(S) Val <- E$values Vec <- E$vectors v1 <- sqrt(Val[1]*qchisq(0.95,2))*Vec[,1] v2 <- sqrt(Val[2]*qchisq(0.95,2))*Vec[,2] segments(-v1[1],-v1[2],v1[1],v1[2]) segments(-v2[1],-v2[2],v2[1],v2[2]) cheers, Rolf Turner From rolf.turner at xtra.co.nz Sun Oct 2 04:57:29 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Sun, 02 Oct 2011 15:57:29 +1300 Subject: [R] Sum of Probabilities in a matrix... In-Reply-To: References: Message-ID: <4E87D319.2020603@xtra.co.nz> On 02/10/11 14:06, Jim Silverton wrote: > Hi all, > I have 2 columns in a mtrix, one of which is a column of probabilities and > the other is simply a vector of integers. I want to sum all the > probabilities with the same integer value and put it in a new column. > For example, > If my matrix is: > > 0.98 2 > 0.2 1 > 0.01 2 > 0.5 1 > 0.6 6 > > > Then I should get: > 0.98 2 0.99 > 0.2 1 0.70 > 0.01 2 0.99 > 0.5 1 0.70 > 0.6 6 0.60 > > Any help is greatly appreciated. Suppose your matrix is called "m". Execute: > ttt <- tapply(m[,1],m[,2],sum) > m <- cbind(m,ttt[match(m[,2],names(ttt))]) > dimnames(m) <- NULL # To tidy up a bit. You get: > m [,1] [,2] [,3] [1,] 0.98 2 0.99 [2,] 0.20 1 0.70 [3,] 0.01 2 0.99 [4,] 0.50 1 0.70 [5,] 0.60 6 0.60 Easy-peasy. cheers, Rolf Turner From felipnunes at gmail.com Sun Oct 2 04:59:21 2011 From: felipnunes at gmail.com (Felipe Nunes) Date: Sat, 1 Oct 2011 19:59:21 -0700 Subject: [R] Tobit Fixed Effects In-Reply-To: References: <1316069761779-3814830.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From erinm.hodgess at gmail.com Sun Oct 2 05:37:37 2011 From: erinm.hodgess at gmail.com (Erin Hodgess) Date: Sat, 1 Oct 2011 22:37:37 -0500 Subject: [R] getting list of data.frame names Message-ID: Dear R People: This is probably a very simple question. I know that if I want to get a list of the classes of the objects in the workspace, I can do this: > sapply(ls(), function(x)class(get(x))) a a1.df b d "list" "data.frame" "integer" "numeric" Now I want to get just the data frames. > sapply(ls(), function(x)class(get(x))=="data.frame") a a1.df b d FALSE TRUE FALSE FALSE However, I would like the names of the data frames, rather than the True/False for the objects. I've been trying all sorts of combinations/permutations with no success. Any suggestions would be much appreciated. Thanks, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com From jwiley.psych at gmail.com Sun Oct 2 05:46:10 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sat, 1 Oct 2011 20:46:10 -0700 Subject: [R] getting list of data.frame names In-Reply-To: References: Message-ID: Hi Erin, Try this: names(which(sapply(.GlobalEnv, is.data.frame))) Cheers, Josh On Sat, Oct 1, 2011 at 8:37 PM, Erin Hodgess wrote: > Dear R People: > > This is probably a very simple question. ?I know that if I want to get > a list of the classes of the objects in the workspace, I can do this: > >> sapply(ls(), function(x)class(get(x))) > ? ? ? ? ? a ? ? ? ?a1.df ? ? ? ? ? ?b ? ? ? ? ? ?d > ? ? ?"list" "data.frame" ? ?"integer" ? ?"numeric" > > Now I want to get just the data frames. >> sapply(ls(), function(x)class(get(x))=="data.frame") > ? ?a a1.df ? ? b ? ? d > FALSE ?TRUE FALSE FALSE > > However, I would like the names of the data frames, rather than the > True/False for the objects. > > I've been trying all sorts of combinations/permutations with no success. > > Any suggestions would be much appreciated. > > Thanks, > Sincerely, > Erin > > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodgess at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From nbhardwaj at gmail.com Sat Oct 1 21:55:17 2011 From: nbhardwaj at gmail.com (Nitin Bhardwaj) Date: Sat, 1 Oct 2011 15:55:17 -0400 Subject: [R] Fitting 3 beta distributions Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From 1 at VictoriasJourney.com Sat Oct 1 22:38:30 2011 From: 1 at VictoriasJourney.com (Victoria_Stuart) Date: Sat, 1 Oct 2011 13:38:30 -0700 (PDT) Subject: [R] Entering data into a multi-way array? In-Reply-To: <1317438749773-3862054.post@n4.nabble.com> References: <1317438749773-3862054.post@n4.nabble.com> Message-ID: <1317501510635-3863670.post@n4.nabble.com> I am trying to replicate the script, appended below. My data is in OOCalc files. The script (below) synthesizes a dataset (it serves as a "tutorial"), but I will need to get my data from OOCalc into R for use in that script (which uses arrays). I've worked my way through the script, and understand how most of it works (except the first bit - Step 1 - which is irrelevant to me, anyway). [begin script] ### Supplementary material with the paper ### Interpretation of ANOVA models for microarray data using PCA ### J.R. de Haan et al. Bioinformatics (2006) ### Please cite this paper when you use this code in a publication. ### Written by J.R. de Haan, December 18, 2006 ### Step1: a synthetic dataset of 500 genes is generated with 5 classes ### 1 unresponsive genes (300 genes) ### 2 constant genes (50 genes) ### 3 profile 1 (50 genes) ### 4 profile 2 (50 genes) ### 5 profile 3 (50 genes) #generate synthetic dataset with similar dimensions: # 500 genes, 3 replicates, 10 timepoints, 4 treatments X <- array(0, c(500, 3, 10, 4)) labs.synth <- c(rep(1, 300), rep(2, 50), rep(3, 50), rep(4, 50), rep(5, 50)) gnames <- cbind(labs.synth, labs.synth) #print(dim(gnames)) gnames[1:300,2] <- "A" gnames[301:350,2] <- "B" gnames[351:400,2] <- "C" gnames[401:450,2] <- "D" gnames[451:500,2] <- "E" ### generate 300 "noise" genes with expressions slightly larger than ### the detection limit (class 1) X[labs.synth==1,1,,] <- rnorm(length(X[labs.synth==1,1,,]), mean=50, sd=40) X[labs.synth==1,2,,] <- X[labs.synth==1,1,,] + rnorm(length(X[labs.synth==1,1,,]), mean=0, sd=10) X[labs.synth==1,3,,] <- X[labs.synth==1,1,,] + rnorm(length(X[labs.synth==1,1,,]), mean=0, sd=10) # generate 50 stable genes at two levels (class 2) X[301:325,1,,] <- rnorm(length(X[301:325,1,,]), mean=1500, sd=40) X[301:325,2,,] <- X[301:325,1,,] + rnorm(length(X[301:325,1,,]), mean=0, sd=10) X[301:325,3,,] <- X[301:325,1,,] + rnorm(length(X[301:325,1,,]), mean=0, sd=10) X[326:350,1,,] <- rnorm(length(X[326:350,1,,]), mean=3000, sd=40) X[326:350,2,,] <- X[326:350,1,,] + rnorm(length(X[326:350,1,,]), mean=0, sd=10) X[326:350,3,,] <- X[326:350,1,,] + rnorm(length(X[326:350,1,,]), mean=0, sd=10) # generate50 genes with profile 1 (class 3) increase.range <- matrix(rep(1:50, 10), ncol=10, byrow=FALSE) profA3 <- matrix(rep(c(10, 60, 110, 150, 150, 150, 150, 150, 150, 150) , 50), ncol=10, byrow=TRUE) * increase.range X[351:400,1,,1] <- profA3 + rnorm(length(profA3), mean=0, sd=40) profB3 <- matrix(rep(c(10, 100, 220, 280, 280, 280, 280, 280, 280, 280), 50), ncol=10, byrow=TRUE) * increase.range X[351:400,1,1:10,2] <- profB3 + rnorm(length(profA3), mean=0, sd=40) profC3 <- matrix(rep(c(10, 120, 300, 300, 280, 280, 280, 280, 280, 280), 50), ncol=10, byrow=TRUE) * increase.range X[351:400,1,1:10,3] <- profC3 + rnorm(length(profA3), mean=0, sd=40) profD3 <- matrix(rep(c(100, 75, 50, 50, 50, 50, 50, 50, 75, 100), 50), ncol=10, byrow=TRUE) X[351:400,1,1:10,4] <- profD3 + rnorm(length(profA3), mean=0, sd=40) #again replicates X[351:400,2,,] <- X[351:400,1,,] + rnorm(length(X[351:400,2,,]), mean=0, sd=10) X[351:400,3,,] <- X[351:400,1,,] + rnorm(length(X[351:400,3,,]), mean=0, sd=10) # generate50 genes with profile 2 (class 4) increase.range <- matrix(rep(1:50, 10), ncol=10, byrow=FALSE) profA4 <- matrix(rep(c(10, 60, 110, 150, 125, 100, 75, 50, 50, 50) , 50), ncol=10, byrow=TRUE) * increase.range X[401:450,1,,1] <- profA4 + rnorm(length(profA4), mean=0, sd=40) profB4 <- matrix(rep(c(10, 100, 220, 280, 200, 150, 100, 50, 50, 50), 50), ncol=10, byrow=TRUE) * increase.range X[401:450,1,1:10,2] <- profB4 + rnorm(length(profA4), mean=0, sd=40) profC4 <- matrix(rep(c(10, 150, 300, 220, 150, 100, 50, 50, 50, 50), 50), ncol=10, byrow=TRUE) * increase.range X[401:450,1,1:10,3] <- profC4 + rnorm(length(profA4), mean=0, sd=40) profD4 <- matrix(rep(c(150, 100, 50, 50, 75, 75, 75, 100, 100, 100), 50), ncol=10, byrow=TRUE) X[401:450,1,1:10,4] <- profD4 + rnorm(length(profA4), mean=0, sd=40) #again replicates X[401:450,2,,] <- X[401:450,1,,] + rnorm(length(X[401:450,2,,]), mean=0, sd=10) X[401:450,3,,] <- X[401:450,1,,] + rnorm(length(X[401:450,3,,]), mean=0, sd=10) # generate50 genes with profile 3 (class 5) increase.range <- matrix(rep(1:25, 20), ncol=10, byrow=FALSE) profA4 <- matrix(rep((200 - c(10, 60, 110, 150, 125, 100, 75, 50, 50, 50)), 50), ncol=10, byrow=TRUE) * increase.range X[451:500,1,,1] <- profA4 + rnorm(length(profA4), mean=0, sd=40) profB4 <- matrix(rep((200 - c(10, 100, 180, 200, 200, 150, 100, 50, 50, 50)), 50), ncol=10, byrow=TRUE) * increase.range X[451:500,1,1:10,2] <- profB4 + rnorm(length(profA4), mean=0, sd=40) profC4 <- matrix(rep((200 - c(10, 150, 200, 180, 150, 100, 50, 50, 50, 50)), 50), ncol=10, byrow=TRUE) * increase.range X[451:500,1,1:10,3] <- profC4 + rnorm(length(profA4), mean=0, sd=40) profD4 <- matrix(rep((200 - c(150, 100, 50, 50, 75, 75, 75, 100, 100, 100)), 50), ncol=10, byrow=TRUE) X[451:500,1,1:10,4] <- profD4 + rnorm(length(profA4), mean=0, sd=40) #again replicates X[451:500,2,,] <- X[451:500,1,,] + rnorm(length(X[451:500,2,,]), mean=0, sd=10) X[451:500,3,,] <- X[451:500,1,,] + rnorm(length(X[451:500,3,,]), mean=0, sd=10) # Step 2: Now the effects for different factors in the ANOVA model # can be calculated: # subtraction of the general mean x <- X - mean(X, na.rm=TRUE) tpoints <- c(1, 3, 6, 12,c( 24*c(1, 2, 3, 5, 8, 12))) nrgenes <- dim(x)[1] # calculation of the three main effects cat("calculating main effects\n") timemeans <- apply(x, 3, mean, na.rm=TRUE) treatmeans <- apply(x, 4, mean, na.rm=TRUE) genemeans <- apply(x, 1, mean, na.rm=TRUE) #par(mfrow=c(2, 3)) # calculation of the interaction effects # interaction time-treatment cat("calculating interaction time-treatment\n") mean.ti.tr <- apply(x, c(3,4), mean, na.rm=TRUE) #print(dim(mean.ti.tr)) time1.m <- matrix(rep(timemeans, 4), nrow=10, byrow=FALSE) tr1.m <- matrix(rep(treatmeans, 10), nrow=10, byrow=TRUE) int.ti.tr <- (mean.ti.tr - time1.m) - tr1.m # interaction time-gene cat("calculating interaction time-gene\n") mean.ti.gene <- apply(x, c(1,3), mean, na.rm=TRUE) #print(dim(mean.ti.gene)) time2.m <- matrix(rep(timemeans, dim(x)[1]), nrow=dim(x)[1], byrow=TRUE) gene2.m <- matrix(rep(genemeans, 10), nrow=dim(x)[1], byrow=FALSE) int.ti.gene <- (mean.ti.gene - time2.m) - gene2.m # interaction gene-treatment cat("calculating interaction gene-treatment\n") mean.gene.tr <- apply(x, c(1,4), mean, na.rm=TRUE) #print(dim(mean.gene.tr)) gene3.m <- matrix(rep(genemeans, 4), ncol=4, byrow=FALSE) tr3.m <- matrix(rep(treatmeans, dim(x)[1]), ncol=4, byrow=TRUE) int.gene.tr <- (mean.gene.tr - gene3.m) - tr3.m # calculation of 3 factor interaction cat("calculating 3 factor interaction\n") mean.gene.time.tr <- apply(x, c(1, 3, 4), mean, na.rm=TRUE) #print(dim(mean.gene.time.tr)) ar.ti.tr <- array(0, c(nrgenes, 10, 4)) for (i in 1:nrgenes){ar.ti.tr[i,,] <- int.ti.tr} ar.ti.gene <- array(0, c(nrgenes, 10, 4)) for (i in 1:4){ar.ti.gene[,,i] <- int.ti.gene} ar.gene.tr <- array(0, c(nrgenes, 10, 4)) for (i in 1:10){ar.gene.tr[,i,] <- int.gene.tr} ar.ti <- array(0, c(nrgenes, 10, 4)) for (i in 1:4){ar.ti[,,i] <- time2.m} ar.tr <- array(0, c(nrgenes, 10, 4)) for (i in 1:10){ar.tr[,i,] <- tr3.m} ar.gene <- array(0, c(nrgenes, 10, 4)) for (i in 1:4){ar.gene[,,i] <- gene2.m} int.gene.time.tr <- mean.gene.time.tr - ar.ti - ar.tr - ar.gene - ar.ti.tr - ar.ti.gene - ar.gene.tr imat.gtt <- cbind(int.gene.time.tr[,,1], int.gene.time.tr[,,2], int.gene.time.tr[,,3], int.gene.time.tr[,,4]) ### calculation of error term error.term1 <- abs(sweep(x, c(1, 3, 4), mean.gene.time.tr)) cell.error <- apply(error.term1, c(1, 3, 4), mean, na.rm=TRUE) mncn.error <- scale(cbind(cell.error[,,1], cell.error[,,2], cell.error[,,3], cell.error[,,4]), scale=FALSE) ### Step 3: The results of the model can now be inspected with ### different plots (PCA for the interactions) #plot(timemeans, type="l") #plot(treatmeans, type="l") #plot(genemeans, type="l") #source("biplot.R") #jbiplot(1, 2, int.ti.gene, gnames[,2], as.character(tpoints), rep(2, 10)) #jbiplot(1, 2, int.gene.tr, gnames[,2], c("TR1", "TR2", "TR3", "UNT"), 2:5) #jbiplot(1, 2, int.ti.tr, as.character(tpoints), c("TR1", "TR2", "TR3", "UNT"), rep(2, 4)) #jbiplot(1, 2, imat.gtt, gnames[,2], as.character(rep(tpoints, 4)), sort(rep(2:5, 10))) [end script] -- View this message in context: http://r.789695.n4.nabble.com/Entering-data-into-a-multi-way-array-tp3862054p3863670.html Sent from the R help mailing list archive at Nabble.com. From paul.heinrich.dietrich at gmail.com Sun Oct 2 02:38:54 2011 From: paul.heinrich.dietrich at gmail.com (zerfetzen) Date: Sat, 1 Oct 2011 17:38:54 -0700 (PDT) Subject: [R] Multivariate Laplace density Message-ID: <1317515934321-3864072.post@n4.nabble.com> Can anyone show how to calculate a multivariate Laplace density? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-Laplace-density-tp3864072p3864072.html Sent from the R help mailing list archive at Nabble.com. From vince.pileggi at ontario.ca Sun Oct 2 05:20:10 2011 From: vince.pileggi at ontario.ca (Vince) Date: Sat, 1 Oct 2011 20:20:10 -0700 (PDT) Subject: [R] deSolve - Function daspk on DAE system - Error Message-ID: <1317525610060-3864298.post@n4.nabble.com> I'm getting this error on the attached code and breaking my head but can't figure it out. Any help is much appreciated. Thanks, Vince CODE: library(deSolve) Res_DAE=function(t, y, dy, pars) { with(as.list(c(y, dy, pars)), { res1 = -dS -dES-k2*ES res2 = -dP + k2*ES eq1 = Eo-E -ES eq2 = So-S -ES -P return(list(c(res1, res2, eq1, eq2))) }) } pars <- c(Eo=0.02, So=0.02, k2=250, E=0.01); pars yini <- c(S=0.01, ES = 0.01, P=0.0, E=0.01); yini times <- seq(0, 0.01, by = 0.0001); times dyini = c(dS=0.0, dES=0.0, dP=0.0) ## Tabular output check of matrix output DAE <- daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, atol = 1e-10, rtol = 1e-10) ERROR: daspk-- warning.. At T(=R1) and stepsize H (=R2) the nonlinear solver f nonlinear solver failed to converge repeatedly of with abs (H) = H repeatedly of with abs (H) = HMIN preconditioner had repeated failur 0.0000000000000D+00 0.5960464477539D-14 Warning messages: 1: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, : repeated convergence test failures on a step - inaccurate Jacobian or preconditioner? 2: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, : Returning early. Results are accurate, as far as they go -- View this message in context: http://r.789695.n4.nabble.com/deSolve-Function-daspk-on-DAE-system-Error-tp3864298p3864298.html Sent from the R help mailing list archive at Nabble.com. From xye78 at hotmail.com Sun Oct 2 01:20:53 2011 From: xye78 at hotmail.com (yehengxin) Date: Sat, 1 Oct 2011 16:20:53 -0700 (PDT) Subject: [R] Poor performance of "Optim" In-Reply-To: <1317449543774-3862229.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> Message-ID: <1317511253729-3863969.post@n4.nabble.com> What I tried is just a simple binary probit model. Create a random data and use "optim" to maximize the log-likelihood function to estimate the coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And y = (u > 0), y indicating a binary choice variable) If I estimate coefficient of "x", I should be able to get a value close to 0.2 if sample is large enough. Say I got 0.18. If I expand x by twice and reestimate the model, which coefficient should I get? 0.09, right? But with "optim", I got something different. When I do the same thing in both Gauss and Matlab, I can exactly get 0.09, evidencing that the coefficient estimator is reliable. But R's "optim" does not give me a reliable estimator. -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3863969.html Sent from the R help mailing list archive at Nabble.com. From xye78 at hotmail.com Sun Oct 2 04:43:39 2011 From: xye78 at hotmail.com (yehengxin) Date: Sat, 1 Oct 2011 19:43:39 -0700 (PDT) Subject: [R] Poor performance of "Optim" In-Reply-To: <1317518250373-3864133.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> <1317511253729-3863969.post@n4.nabble.com> <1317518250373-3864133.post@n4.nabble.com> Message-ID: <1317523419304-3864243.post@n4.nabble.com> Thank you for your response! But the problem is when I estimate a model without knowing the true coefficients, how can I know which "reltol" is good enough? "1e-8" or "1e-10"? Why can commercial packages automatically determine the right "reltol" but R cannot? -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864243.html Sent from the R help mailing list archive at Nabble.com. From xye78 at hotmail.com Sun Oct 2 04:58:56 2011 From: xye78 at hotmail.com (yehengxin) Date: Sat, 1 Oct 2011 19:58:56 -0700 (PDT) Subject: [R] Poor performance of "Optim" In-Reply-To: <1317449543774-3862229.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> Message-ID: <1317524336687-3864271.post@n4.nabble.com> Oh, I think I got it. Commercial packages limit the number of decimals shown. -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864271.html Sent from the R help mailing list archive at Nabble.com. From marius.hofert at math.ethz.ch Sun Oct 2 10:20:55 2011 From: marius.hofert at math.ethz.ch (Hofert Jan Marius) Date: Sun, 2 Oct 2011 08:20:55 +0000 Subject: [R] plot: how to fix the ratio of the plot box? Message-ID: <63838993-358A-41D5-BBB8-64C14306ED65@math.ethz.ch> Dear all, this should be trivial, but I couldn't figure out how to solve it... I would like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are extended so that the plot fills the whole window. However, if you have different extensions for the different axes, the plot does not look like "a square" anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning that the plot box is a square no matter how I resize the Quartz window? I searched for this and found: http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html It is more or less recommended to use lattice's xyplot for that. Is there no solution for base graphics? [I know that the extension is by default 4% and that's great, but the the size of the Quartz window should not change this (which it does if you resize the window accordingly)]. Cheers, Marius Minimal example: u <- runif(10) pdf(width=5, height=5) plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main="My title") dev.off() From daniel at umd.edu Sun Oct 2 11:07:09 2011 From: daniel at umd.edu (Daniel Malter) Date: Sun, 2 Oct 2011 02:07:09 -0700 (PDT) Subject: [R] Poor performance of "Optim" In-Reply-To: <1317518250373-3864133.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> <1317511253729-3863969.post@n4.nabble.com> <1317518250373-3864133.post@n4.nabble.com> Message-ID: <1317546429293-3864681.post@n4.nabble.com> Ben Bolker sent me a private email rightfully correcting me that was factually wrong when I wrote that ML /is/ a numerical method (I had written sloppily and under time pressure). He is of course right to point out that not all maximum likelihood estimators require numerical methods to solve. Further, only numerical optimization will show the behavior discussed in this post for the given reasons. (I hope this post isn't yet another blooper of mine at 5 a.m. in the morning). Best, Daniel Daniel Malter wrote: > > With respect, your statement that R's optim does not give you a reliable > estimator is bogus. As pointed out before, this would depend on when optim > believes it's good enough and stops optimizing. In particular if you > stretch out x, then it is plausible that the likelihood function will > become flat enough "earlier," so that the numerical optimization will stop > earlier (i.e., optim will "think" that the slope of the likelihood > function is flat enough to be considered zero and stop earlier than it > will for more condensed data). After all, maximum likelihood is a > numerical method and thus an approximation. I would venture to say that > what you describe lies in the nature of this method. You could also follow > the good advice given earlier, by increasing the number of iterations or > decreasing the tolerance. > > However, check the example below: for all purposes it's really close > enough and has nothing to do with optim being "unreliable." > > n<-1000 > x<-rnorm(n) > y<-0.5*x+rnorm(n) > z<-ifelse(y>0,1,0) > > X<-cbind(1,x) > b<-matrix(c(0,0),nrow=2) > > #Probit > reg<-glm(z~x,family=binomial("probit")) > > #Optim reproducing probit (with minor deviations due to difference in > method) > LL<-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))} > optim(c(0,0),LL) > > #Multiply x by 2 and repeat optim > X[,2]=2*X[,2] > optim(c(0,0),LL) > > HTH, > Daniel > > > > yehengxin wrote: >> >> What I tried is just a simple binary probit model. Create a random data >> and use "optim" to maximize the log-likelihood function to estimate the >> coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And y = >> (u > 0), y indicating a binary choice variable) >> >> If I estimate coefficient of "x", I should be able to get a value close >> to 0.2 if sample is large enough. Say I got 0.18. >> >> If I expand x by twice and reestimate the model, which coefficient should >> I get? 0.09, right? >> >> But with "optim", I got something different. When I do the same thing in >> both Gauss and Matlab, I can exactly get 0.09, evidencing that the >> coefficient estimator is reliable. But R's "optim" does not give me a >> reliable estimator. >> > -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864681.html Sent from the R help mailing list archive at Nabble.com. From daniel at umd.edu Sun Oct 2 11:11:33 2011 From: daniel at umd.edu (Daniel Malter) Date: Sun, 2 Oct 2011 02:11:33 -0700 (PDT) Subject: [R] Poor performance of "Optim" In-Reply-To: <1317546429293-3864681.post@n4.nabble.com> References: <1317449543774-3862229.post@n4.nabble.com> <1317511253729-3863969.post@n4.nabble.com> <1317518250373-3864133.post@n4.nabble.com> <1317546429293-3864681.post@n4.nabble.com> Message-ID: <1317546693584-3864688.post@n4.nabble.com> And there I caught myself with the next blooper: it wasn't Ben Bolker, it was Bert Gunter who pointed that out. :) Daniel Malter wrote: > > Ben Bolker sent me a private email rightfully correcting me that was > factually wrong when I wrote that ML /is/ a numerical method (I had > written sloppily and under time pressure). He is of course right to point > out that not all maximum likelihood estimators require numerical methods > to solve. Further, only numerical optimization will show the behavior > discussed in this post for the given reasons. (I hope this post isn't yet > another blooper of mine at 5 a.m. in the morning). > > Best, > Daniel > > > Daniel Malter wrote: >> >> With respect, your statement that R's optim does not give you a reliable >> estimator is bogus. As pointed out before, this would depend on when >> optim believes it's good enough and stops optimizing. In particular if >> you stretch out x, then it is plausible that the likelihood function will >> become flat enough "earlier," so that the numerical optimization will >> stop earlier (i.e., optim will "think" that the slope of the likelihood >> function is flat enough to be considered zero and stop earlier than it >> will for more condensed data). After all, maximum likelihood is a >> numerical method and thus an approximation. I would venture to say that >> what you describe lies in the nature of this method. You could also >> follow the good advice given earlier, by increasing the number of >> iterations or decreasing the tolerance. >> >> However, check the example below: for all purposes it's really close >> enough and has nothing to do with optim being "unreliable." >> >> n<-1000 >> x<-rnorm(n) >> y<-0.5*x+rnorm(n) >> z<-ifelse(y>0,1,0) >> >> X<-cbind(1,x) >> b<-matrix(c(0,0),nrow=2) >> >> #Probit >> reg<-glm(z~x,family=binomial("probit")) >> >> #Optim reproducing probit (with minor deviations due to difference in >> method) >> LL<-function(b){-sum(z*log(pnorm(X%*%b))+(1-z)*log(1-pnorm(X%*%b)))} >> optim(c(0,0),LL) >> >> #Multiply x by 2 and repeat optim >> X[,2]=2*X[,2] >> optim(c(0,0),LL) >> >> HTH, >> Daniel >> >> >> >> yehengxin wrote: >>> >>> What I tried is just a simple binary probit model. Create a random >>> data and use "optim" to maximize the log-likelihood function to estimate >>> the coefficients. (e.g. u = 0.1+0.2*x + e, e is standard normal. And >>> y = (u > 0), y indicating a binary choice variable) >>> >>> If I estimate coefficient of "x", I should be able to get a value close >>> to 0.2 if sample is large enough. Say I got 0.18. >>> >>> If I expand x by twice and reestimate the model, which coefficient >>> should I get? 0.09, right? >>> >>> But with "optim", I got something different. When I do the same thing >>> in both Gauss and Matlab, I can exactly get 0.09, evidencing that the >>> coefficient estimator is reliable. But R's "optim" does not give me a >>> reliable estimator. >>> >> > -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3864688.html Sent from the R help mailing list archive at Nabble.com. From osoramirez at gmail.com Sun Oct 2 10:18:09 2011 From: osoramirez at gmail.com (=?iso-8859-1?Q?Oscar_Ram=EDrez?=) Date: Sun, 2 Oct 2011 02:18:09 -0600 Subject: [R] Ipad on R Message-ID: <000301cc80db$d324f6f0$796ee4d0$@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jim at bitwrit.com.au Sun Oct 2 13:08:54 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Sun, 02 Oct 2011 22:08:54 +1100 Subject: [R] plot: how to fix the ratio of the plot box? In-Reply-To: <63838993-358A-41D5-BBB8-64C14306ED65@math.ethz.ch> References: <63838993-358A-41D5-BBB8-64C14306ED65@math.ethz.ch> Message-ID: <4E884646.7080505@bitwrit.com.au> On 10/02/2011 07:20 PM, Hofert Jan Marius wrote: > Dear all, > > this should be trivial, but I couldn't figure out how to solve it... I would like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are extended so that the plot fills the whole window. However, if you have different extensions for the different axes, the plot does not look like "a square" anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning that the plot box is a square no matter how I resize the Quartz window? > > I searched for this and found: http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html > It is more or less recommended to use lattice's xyplot for that. Is there no solution for base graphics? > [I know that the extension is by default 4% and that's great, but the the size of the Quartz window should not change this (which it does if you resize the window accordingly)]. > > Cheers, > > Marius > > Minimal example: > u<- runif(10) > pdf(width=5, height=5) > plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main="My title") > dev.off() > Hi Marius, Have you tried: par(pty="s") after you open the device and before plotting? Jim From marius.hofert at math.ethz.ch Sun Oct 2 13:21:19 2011 From: marius.hofert at math.ethz.ch (Hofert Jan Marius) Date: Sun, 2 Oct 2011 11:21:19 +0000 Subject: [R] plot: how to fix the ratio of the plot box? In-Reply-To: <4E884646.7080505@bitwrit.com.au> References: <63838993-358A-41D5-BBB8-64C14306ED65@math.ethz.ch> <4E884646.7080505@bitwrit.com.au> Message-ID: ahh, perfect, thanks. Cheers, Marius On 2011-10-02, at 13:08 , Jim Lemon wrote: > On 10/02/2011 07:20 PM, Hofert Jan Marius wrote: >> Dear all, >> >> this should be trivial, but I couldn't figure out how to solve it... I would like to have a plot with fixed aspect ratio of 1. Whenever I resize the Quartz window, the axes are extended so that the plot fills the whole window. However, if you have different extensions for the different axes, the plot does not look like "a square" anymore (i.e., aspect ratio 1). The same of course happens if you print it to .pdf (ultimate goal). How can I fix the plot box (formed by the axes) ratio to be 1, meaning that the plot box is a square no matter how I resize the Quartz window? >> >> I searched for this and found: http://tolstoy.newcastle.edu.au/R/help/05/04/2888.html >> It is more or less recommended to use lattice's xyplot for that. Is there no solution for base graphics? >> [I know that the extension is by default 4% and that's great, but the the size of the Quartz window should not change this (which it does if you resize the window accordingly)]. >> >> Cheers, >> >> Marius >> >> Minimal example: >> u<- runif(10) >> pdf(width=5, height=5) >> plot(u, u, asp=1, xlim=c(0,1), ylim=c(0,1), main="My title") >> dev.off() >> > Hi Marius, > Have you tried: > > par(pty="s") > > after you open the device and before plotting? > > Jim > From sarah.goslee at gmail.com Sun Oct 2 13:39:47 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Sun, 2 Oct 2011 07:39:47 -0400 Subject: [R] Ipad on R In-Reply-To: <000301cc80db$d324f6f0$796ee4d0$@gmail.com> References: <000301cc80db$d324f6f0$796ee4d0$@gmail.com> Message-ID: 2011/10/2 Oscar Ram?rez : > It is possible to install R on Ipad 2? > This discussion predates the iPad 2, but the licensing restrictions likely still apply: http://www.r-statistics.com/2010/06/could-we-run-a-statistical-analysis-on-iphoneipad-using-r/ One-word answer: no. Two-word answer: Not legally. But do read the discussion at the above link. -- Sarah Goslee http://www.functionaldiversity.org From cecilia.carmo at ua.pt Sun Oct 2 13:48:15 2011 From: cecilia.carmo at ua.pt (Cecilia Carmo) Date: Sun, 2 Oct 2011 12:48:15 +0100 Subject: [R] subset in dataframes Message-ID: <000001cc80f9$2c484e90$84d8ebb0$@carmo@ua.pt> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarah.goslee at gmail.com Sun Oct 2 14:00:59 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Sun, 2 Oct 2011 08:00:59 -0400 Subject: [R] subset in dataframes In-Reply-To: <4e884fca.9955df0a.5093.fffff3f7SMTPIN_ADDED@mx.google.com> References: <4e884fca.9955df0a.5093.fffff3f7SMTPIN_ADDED@mx.google.com> Message-ID: Hi, On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo wrote: > I need help in subseting a dataframe: > > > > data1<-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, > > 2001,2002,2003,2004,2001,2002,2003,2004), > > firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, > > 101,14,87,56,12,43,67,54), > > y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) Thank you for providing a reproducible example. > > > data1 > > > > I want to keep the firms where all x>0 (where there are no negative values > in x) > > So my output should be: > > ? year firm ? x ? ?y > > 1 ?2001 ? ?3 101 1010 > > 2 2002 ? ?3 ?14 ?140 > > 3 2003 ? ?3 ?87 ?870 > > 4 2004 ? ?3 ?56 ?560 > > 5 2001 ? ?4 ?12 ?120 > > 6 2002 ? ?4 ?43 ?430 > > 7 2003 ? ?4 ?67 ?670 > > 8 2004 ? ?4 ?54 ?540 > > > > So I'm doing: > > data2<-data1[data1$firm%in%subset(data1,data1$x>0),] > > data2 > What about finding which ones have negative values and should be deleted, > unique(data1$firm[data1$x <= 0]) [1] 1 2 And then deleting them? > data1[!(data1$firm %in% unique(data1$firm[data1$x <= 0])),] year firm x y 9 2001 3 101 1010 10 2002 3 14 140 11 2003 3 87 870 12 2004 3 56 560 13 2001 4 12 120 14 2002 4 43 430 15 2003 4 67 670 16 2004 4 54 540 > > But the result is > > [1] year firm x ? ?y > > <0 rows> (or 0-length row.names) > If you look at just the result of part of your code, subset(data1,data1$x>0) it isn't giving at all what you need for the next step: the entire data frame for x>0. Sarah -- Sarah Goslee http://www.functionaldiversity.org From sandorl at gmail.com Sun Oct 2 14:30:12 2011 From: sandorl at gmail.com (=?iso-8859-1?Q?L=E1szl=F3_S=E1ndor?=) Date: Sun, 2 Oct 2011 08:30:12 -0400 Subject: [R] Difference between ~lp() or simply ~ in R's locfit? Message-ID: <8E1B3643-1B4E-4C10-AEE2-1F58153F49E0@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Achim.Zeileis at uibk.ac.at Sun Oct 2 14:44:41 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Sun, 2 Oct 2011 14:44:41 +0200 (CEST) Subject: [R] Fitting 3 beta distributions In-Reply-To: References: Message-ID: On Sat, 1 Oct 2011, Nitin Bhardwaj wrote: > Hi, > I want to fit 3 beta distributions to my data which ranges between 0 and 1. > What are the functions that I can easily call and specify that 3 beta > distributions should be fitted? > I have already looked at normalmixEM and fitdistr but they dont seem to be > applicable (normalmixEM is only for fitting normal dist and fitdistr will > only fit 1 distribution, not 3). Is that right? >From your description above, I guess that (a) you want to fit a _mixture_ of 3 beta distributions, and (b) have tried to use "mixtools" and "MASS" so far. Based on these assumptions: fitdistr() does not fit mixture models. "mixtools" does fit mixtures and the accompanying paper has an example where a nonparametric model is applied to mixtures of beta distributions. Furthermore, the "betareg" package has a function betamix() which can fit mixtures of beta regression models (including the special case of no covariates). Both "mixtools" and "betareg" have been published in JSS, as indicated when calling citation("mixtools") and citation("betareg"): http://www.jstatsoft.org/v32/i06/ http://www.jstatsoft.org/v34/i02/ The latter does not yet contain the betamix() function. As an example, one can use the artificial data generated in Section 5.2: set.seed(123) y1 <- c(rbeta(150, 0.3 * 4, 0.7 * 4), rbeta(50, 0.5 * 4, 0.5 * 4)) y2 <- c(rbeta(100, 0.3 * 4, 0.7 * 4), rbeta(100, 0.3 * 8, 0.7 * 8)) d <- data.frame(y1, y2) bm1 <- betamix(y1 ~ 1 | 1, data = d, k = 2) bm2 <- betamix(y2 ~ 1 | 1, data = d, k = 2) where one should note that compared to R's parametrization of the beta distribution two transformations are employed: From shape1/shape2 to mu/phi and then adding logit/log link functions. > Also, my data has 26 million data points. What can I do to reduce the > computation time with the suggested function? I think all functions above will have problems with 26 million observations directly. One alternative - if the fitting function takes weights - would be to use a representative sample or computing weights on a possibly coarsened grid. hth, Z > thanks a lot in advance, > eagerly waiting for any input. > Best > Nitin > > -- > ??I+I?? > > [[alternative HTML version deleted]] > > From cecilia.carmo at ua.pt Sun Oct 2 15:08:13 2011 From: cecilia.carmo at ua.pt (Cecilia Carmo) Date: Sun, 2 Oct 2011 14:08:13 +0100 Subject: [R] subset in dataframes In-Reply-To: References: <4e884fca.9955df0a.5093.fffff3f7SMTPIN_ADDED@mx.google.com> Message-ID: <000e01cc8104$57ce4550$076acff0$@carmo@ua.pt> Thank you very much. My dataframe has thousands of firms, how can I delete all of those with x<0 and keep another dataframe with firms where all x>0? Thank you again. Cec?lia Carmo (Universidade de Aveiro - Portugal) -----Mensagem original----- De: Sarah Goslee [mailto:sarah.goslee at gmail.com] Enviada: domingo, 2 de Outubro de 2011 13:01 Para: Cecilia Carmo Cc: r-help at r-project.org Assunto: Re: [R] subset in dataframes Hi, On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo wrote: > I need help in subseting a dataframe: > > > > data1<-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, > > 2001,2002,2003,2004,2001,2002,2003,2004), > > firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, > > 101,14,87,56,12,43,67,54), > > y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) Thank you for providing a reproducible example. > > > data1 > > > > I want to keep the firms where all x>0 (where there are no negative values > in x) > > So my output should be: > > ? year firm ? x ? ?y > > 1 ?2001 ? ?3 101 1010 > > 2 2002 ? ?3 ?14 ?140 > > 3 2003 ? ?3 ?87 ?870 > > 4 2004 ? ?3 ?56 ?560 > > 5 2001 ? ?4 ?12 ?120 > > 6 2002 ? ?4 ?43 ?430 > > 7 2003 ? ?4 ?67 ?670 > > 8 2004 ? ?4 ?54 ?540 > > > > So I'm doing: > > data2<-data1[data1$firm%in%subset(data1,data1$x>0),] > > data2 > What about finding which ones have negative values and should be deleted, > unique(data1$firm[data1$x <= 0]) [1] 1 2 And then deleting them? > data1[!(data1$firm %in% unique(data1$firm[data1$x <= 0])),] year firm x y 9 2001 3 101 1010 10 2002 3 14 140 11 2003 3 87 870 12 2004 3 56 560 13 2001 4 12 120 14 2002 4 43 430 15 2003 4 67 670 16 2004 4 54 540 > > But the result is > > [1] year firm x ? ?y > > <0 rows> (or 0-length row.names) > If you look at just the result of part of your code, subset(data1,data1$x>0) it isn't giving at all what you need for the next step: the entire data frame for x>0. Sarah -- Sarah Goslee http://www.functionaldiversity.org From sarah.goslee at gmail.com Sun Oct 2 15:20:33 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Sun, 2 Oct 2011 09:20:33 -0400 Subject: [R] subset in dataframes In-Reply-To: <4e88623f.cd05e30a.354f.fffff580SMTPIN_ADDED@mx.google.com> References: <4e884fca.9955df0a.5093.fffff3f7SMTPIN_ADDED@mx.google.com> <4e88623f.cd05e30a.354f.fffff580SMTPIN_ADDED@mx.google.com> Message-ID: Hi, On Sun, Oct 2, 2011 at 9:08 AM, Cecilia Carmo wrote: > Thank you very much. > > My dataframe has thousands of firms, how can I delete all of those with x<0 > and keep another dataframe with firms where all x>0? How does that differ from your original question? What doesn't work for you in the answer I already gave? Sarah > Thank you again. > > Cec?lia Carmo > (Universidade de Aveiro - Portugal) > > -----Mensagem original----- > De: Sarah Goslee [mailto:sarah.goslee at gmail.com] > Enviada: domingo, 2 de Outubro de 2011 13:01 > Para: Cecilia Carmo > Cc: r-help at r-project.org > Assunto: Re: [R] subset in dataframes > > Hi, > > On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo wrote: >> I need help in subseting a dataframe: >> >> >> >> data1<-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, >> >> 2001,2002,2003,2004,2001,2002,2003,2004), >> >> firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, >> >> 101,14,87,56,12,43,67,54), >> >> y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) > > > Thank you for providing a reproducible example. > >> >> >> data1 >> >> >> >> I want to keep the firms where all x>0 (where there are no negative values >> in x) >> >> So my output should be: >> >> ? year firm ? x ? ?y >> >> 1 ?2001 ? ?3 101 1010 >> >> 2 2002 ? ?3 ?14 ?140 >> >> 3 2003 ? ?3 ?87 ?870 >> >> 4 2004 ? ?3 ?56 ?560 >> >> 5 2001 ? ?4 ?12 ?120 >> >> 6 2002 ? ?4 ?43 ?430 >> >> 7 2003 ? ?4 ?67 ?670 >> >> 8 2004 ? ?4 ?54 ?540 >> >> >> >> So I'm doing: >> >> data2<-data1[data1$firm%in%subset(data1,data1$x>0),] >> >> data2 >> > > > What about finding which ones have negative values and should be deleted, > >> unique(data1$firm[data1$x <= 0]) > [1] 1 2 > > And then deleting them? > >> data1[!(data1$firm %in% unique(data1$firm[data1$x <= 0])),] > ? year firm ? x ? ?y > 9 ?2001 ? ?3 101 1010 > 10 2002 ? ?3 ?14 ?140 > 11 2003 ? ?3 ?87 ?870 > 12 2004 ? ?3 ?56 ?560 > 13 2001 ? ?4 ?12 ?120 > 14 2002 ? ?4 ?43 ?430 > 15 2003 ? ?4 ?67 ?670 > 16 2004 ? ?4 ?54 ?540 > > >> >> But the result is >> >> [1] year firm x ? ?y >> >> <0 rows> (or 0-length row.names) >> > > > If you look at just the result of part of your code, > subset(data1,data1$x>0) > it isn't giving at all what you need for the next step: the entire > data frame for x>0. > -- Sarah Goslee http://www.functionaldiversity.org From cecilia.carmo at ua.pt Sun Oct 2 16:24:48 2011 From: cecilia.carmo at ua.pt (Cecilia Carmo) Date: Sun, 2 Oct 2011 15:24:48 +0100 Subject: [R] subset in dataframes In-Reply-To: References: <4e884fca.9955df0a.5093.fffff3f7SMTPIN_ADDED@mx.google.com> <4e88623f.cd05e30a.354f.fffff580SMTPIN_ADDED@mx.google.com> Message-ID: <001101cc810f$0afe6b00$20fb4100$@carmo@ua.pt> Sarah, Sorry for being ignorant. I was doing something wrong. It works perfectly. Thank you. Cec?lia Carmo -----Mensagem original----- De: Sarah Goslee [mailto:sarah.goslee at gmail.com] Enviada: domingo, 2 de Outubro de 2011 14:21 Para: Cecilia Carmo Cc: r-help at r-project.org Assunto: Re: [R] subset in dataframes Hi, On Sun, Oct 2, 2011 at 9:08 AM, Cecilia Carmo wrote: > Thank you very much. > > My dataframe has thousands of firms, how can I delete all of those with x<0 > and keep another dataframe with firms where all x>0? How does that differ from your original question? What doesn't work for you in the answer I already gave? Sarah > Thank you again. > > Cec?lia Carmo > (Universidade de Aveiro - Portugal) > > -----Mensagem original----- > De: Sarah Goslee [mailto:sarah.goslee at gmail.com] > Enviada: domingo, 2 de Outubro de 2011 13:01 > Para: Cecilia Carmo > Cc: r-help at r-project.org > Assunto: Re: [R] subset in dataframes > > Hi, > > On Sun, Oct 2, 2011 at 7:48 AM, Cecilia Carmo wrote: >> I need help in subseting a dataframe: >> >> >> >> data1<-data.frame(year=c(2001,2002,2003,2004,2001,2002,2003,2004, >> >> 2001,2002,2003,2004,2001,2002,2003,2004), >> >> firm=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),x=c(11,22,-32,25,-26,47,85,98, >> >> 101,14,87,56,12,43,67,54), >> >> y=c(110,220,302,250,260,470,850,980,1010,140,870,560,120,430,670,540)) > > > Thank you for providing a reproducible example. > >> >> >> data1 >> >> >> >> I want to keep the firms where all x>0 (where there are no negative values >> in x) >> >> So my output should be: >> >> ? year firm ? x ? ?y >> >> 1 ?2001 ? ?3 101 1010 >> >> 2 2002 ? ?3 ?14 ?140 >> >> 3 2003 ? ?3 ?87 ?870 >> >> 4 2004 ? ?3 ?56 ?560 >> >> 5 2001 ? ?4 ?12 ?120 >> >> 6 2002 ? ?4 ?43 ?430 >> >> 7 2003 ? ?4 ?67 ?670 >> >> 8 2004 ? ?4 ?54 ?540 >> >> >> >> So I'm doing: >> >> data2<-data1[data1$firm%in%subset(data1,data1$x>0),] >> >> data2 >> > > > What about finding which ones have negative values and should be deleted, > >> unique(data1$firm[data1$x <= 0]) > [1] 1 2 > > And then deleting them? > >> data1[!(data1$firm %in% unique(data1$firm[data1$x <= 0])),] > ? year firm ? x ? ?y > 9 ?2001 ? ?3 101 1010 > 10 2002 ? ?3 ?14 ?140 > 11 2003 ? ?3 ?87 ?870 > 12 2004 ? ?3 ?56 ?560 > 13 2001 ? ?4 ?12 ?120 > 14 2002 ? ?4 ?43 ?430 > 15 2003 ? ?4 ?67 ?670 > 16 2004 ? ?4 ?54 ?540 > > >> >> But the result is >> >> [1] year firm x ? ?y >> >> <0 rows> (or 0-length row.names) >> > > > If you look at just the result of part of your code, > subset(data1,data1$x>0) > it isn't giving at all what you need for the next step: the entire > data frame for x>0. > -- Sarah Goslee http://www.functionaldiversity.org From ngokangmin at gmail.com Sun Oct 2 14:56:14 2011 From: ngokangmin at gmail.com (Kang Min) Date: Sun, 2 Oct 2011 05:56:14 -0700 (PDT) Subject: [R] Overlapping plot in lattice In-Reply-To: References: <865c756d-a5da-43c0-bf79-37e7a234443c@db5g2000vbb.googlegroups.com> Message-ID: <098e5610-6b50-4b6c-9d3f-144494382bd4@u13g2000vbx.googlegroups.com> Thanks Gabor, that was exactly what I needed. On Sep 30, 9:00?pm, Gabor Grothendieck wrote: > On Fri, Sep 30, 2011 at 3:01 AM, Kang Min wrote: > > Hi all, > > > I was wondering if there's an equivalent to par(new=T) of the plot > > function in lattice. I'm plotting an xyplot, and I would like to > > highlight one point by plotting that one point again using a different > > symbol. > > > For example, where 6 is highlighted: > > plot(1:10, xlim=c(0,10), ylim=c(0,10)) > > par(new=T) > > plot(6,6, xlim=c(0,10), ylim=c(0,10), pch=16) > > Try this: > > library(lattice) > xyplot(1:10 ~ 1:10, xlim=c(0,10), ylim=c(0,10)) > trellis.focus() > panel.points(6, 6, pch = 6) > trellis.unfocus() > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From erik.b.svensson at gmail.com Sun Oct 2 16:05:11 2011 From: erik.b.svensson at gmail.com (Erik Svensson) Date: Sun, 2 Oct 2011 07:05:11 -0700 (PDT) Subject: [R] Find all duplicate records Message-ID: <1317564311771-3865139.post@n4.nabble.com> Hello, In a data frame I want to identify ALL duplicate IDs in the example to be able to examine "OS" and "time". (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), OS=c("Win","OSX","Win", "Win64"), time=c("12:22","23:22","04:44","12:28"))) ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 My desired output is that ALL records with the same IDs are found: userA Win 12:22 userA Win 04:44 preferably by returning logical values (TRUE FALSE TRUE FALSE) Is there a simple way to do that? [-- With duplicated(df$ID) the output will be [1] FALSE FALSE TRUE FALSE i.e. not all user A records are found With unique(df$ID) [1] userA userB userC Levels: userA userB userC i.e. one of each ID is found --] Erik Svensson -- View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html Sent from the R help mailing list archive at Nabble.com. From dkh25 at medschl.cam.ac.uk Sun Oct 2 14:02:35 2011 From: dkh25 at medschl.cam.ac.uk (David Humphreys) Date: Sun, 2 Oct 2011 13:02:35 +0100 Subject: [R] Arimax First-Order Transfer Function Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Sun Oct 2 16:48:38 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 02 Oct 2011 16:48:38 +0200 Subject: [R] Find all duplicate records In-Reply-To: <1317564311771-3865139.post@n4.nabble.com> References: <1317564311771-3865139.post@n4.nabble.com> Message-ID: <4E8879C6.50909@statistik.tu-dortmund.de> On 02.10.2011 16:05, Erik Svensson wrote: > Hello, > In a data frame I want to identify ALL duplicate IDs in the example to be > able to examine "OS" and "time". > > (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), > OS=c("Win","OSX","Win", "Win64"), > time=c("12:22","23:22","04:44","12:28"))) > > ID OS time > 1 userA Win 12:22 > 2 userB OSX 23:22 > 3 userA Win 04:44 > 4 userC Win64 12:28 > > My desired output is that ALL records with the same IDs are found: > > userA Win 12:22 > userA Win 04:44 See ?split or ?subset Uwe Ligges > > preferably by returning logical values (TRUE FALSE TRUE FALSE) > > Is there a simple way to do that? > > [-- With duplicated(df$ID) the output will be > [1] FALSE FALSE TRUE FALSE > i.e. not all user A records are found > > With unique(df$ID) > [1] userA userB userC > Levels: userA userB userC > i.e. one of each ID is found --] > > Erik Svensson > > -- > View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3865139.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From rvaradhan at jhmi.edu Sun Oct 2 17:36:58 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Sun, 2 Oct 2011 15:36:58 +0000 Subject: [R] Poor performance of "Optim" Message-ID: <2F9EA67EF9AE1C48A147CB41BE2E15C307B4FF@DOM-EB-MAIL2.win.ad.jhu.edu> Hi, You really need to study the documentation of "optim" carefully before you make broad generalizations. There are several algorithms available in optim. The default is a simplex-type algorithm called Nelder-Mead. I think this is an unfortunate choice as the default algorithm. Nelder-Mead is a robust algorithm that can work well for almost any kind of objective function (smooth or nasty). However, the trade-off is that it is very slow in terms of convergence rate. For simple, smooth problems, such as yours, you should use "BFGS" (or "L-BFGS" if you have simple box-constraints). Also, take a look at the "optimx" package and the most recent paper in J Stat Software on optimx for a better understanding of the wide array of optimization options available in R. Best, Ravi. From jianfeng.mao at gmail.com Sun Oct 2 18:25:34 2011 From: jianfeng.mao at gmail.com (Mao Jianfeng) Date: Sun, 2 Oct 2011 18:25:34 +0200 Subject: [R] generating Venn diagram with 6 sets Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From upananda.pani at gmail.com Sun Oct 2 19:05:27 2011 From: upananda.pani at gmail.com (upananda pani) Date: Sun, 2 Oct 2011 22:35:27 +0530 Subject: [R] regarding specifying criteria for Cointegration Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ehlers at ucalgary.ca Sun Oct 2 19:08:35 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Sun, 02 Oct 2011 10:08:35 -0700 Subject: [R] error while using shapiro.test() In-Reply-To: <1317486293251-3863205.post@n4.nabble.com> References: <1317423562223-3861535.post@n4.nabble.com> <1317486293251-3863205.post@n4.nabble.com> Message-ID: <4E889A93.3050804@ucalgary.ca> On 2011-10-01 09:24, spicymchaggis101 wrote: > Thank you very much! your response solved my issue. > > I needed to determine the probability of normality for word types per page. > You may want to review just what the test does. It certainly does not give you the 'probability of normality'. A worthwhile exercise might be to test several other distributions on your data. Peter Ehlers > -- > View this message in context: http://r.789695.n4.nabble.com/error-while-using-shapiro-test-tp3861535p3863205.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jason at rampaginggeek.com Sun Oct 2 19:17:55 2011 From: jason at rampaginggeek.com (Jason Edgecombe) Date: Sun, 02 Oct 2011 13:17:55 -0400 Subject: [R] On-line machine learning packages? In-Reply-To: <681bd772-b1c7-4373-831d-baff4147c831@m5g2000vbm.googlegroups.com> References: <916c0c5c-ef7d-4a1e-9665-cc9ab5b83bd4@l28g2000yqh.googlegroups.com> <4E6D1334.5010506@rampaginggeek.com> <58b5edc4-234b-42e6-aca8-3fe7e881b5b7@g31g2000yqh.googlegroups.com> <4E6E8D3F.9010800@rampaginggeek.com> <681bd772-b1c7-4373-831d-baff4147c831@m5g2000vbm.googlegroups.com> Message-ID: <4E889CC3.5050106@rampaginggeek.com> Hello Jay, Did you find the answer to your question on incremental machine learning? If not, I found some links that might help: It appears that might be able to do streaming/incremental machine learning in Weka: http://moa.cs.waikato.ac.nz/details/classification/using-weka/ On the above link, there is a link to a free online book on data stream mining: http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf While weka is a separate project from R, there is an R to Weka interface available at http://cran.r-project.org/web/packages/RWeka/index.html Sadly, I didn't see any streaming/incremental machine learning packages on the CRAN machine leaning task view. I would guess that your best bet is using Weka with the Rweka interface, but I'm a neophyte in the machine learning field, so please take this advice with a grain of salt. Sincerely, Jason On 09/13/2011 02:35 AM, Jay wrote: > How does sequential classification differ form running a one-off > classifier for each run? > -> Because feedback from the previous round can and needs to be > incorporated into the ext round. > > > http://lmgtfy.com/?q=R+machine+learning > -> That is a new low. I was hoping to get help, oblivious I was wrong > to use this forum in the hopes of somebody had already battled these > kinds of problems in R. > > > On Sep 13, 1:52 am, Jason Edgecombe wrote: >> I already provided the link to the task view, which provides a list of >> the more popular machine learning algorithms for R. >> >> Do you have a particular algorithm or technique in mind? Does it have a >> name? >> >> How does sequential classification differ form running a one-off >> classifier for each run? >> >> On 09/12/2011 05:24 AM, Jay wrote: >> >> >> >>> In my mind this sequential classification task with feedback is >>> somewhat different from an completely offline, once-off, >>> classification. Am I wrong? >>> However, it looks like the mentality on this topic is to refer me to >>> cran/google in order to look for solutions myself. Oblivious I know >>> about these sources, and as I said, I used rseek.org among other >>> sources to look for solutions. I did not start this topic for fun, I'm >>> asking for help to find a suitable machine learning packages that >>> readily incorporates feedback loops and online learning. If somebody >>> has experience these kinds of problems in R, please respond. >>> Or will >>> "http://cran.r-project.org >>> Look for 'Task Views'" >>> be my next piece of advice? >>> On Sep 12, 11:31 am, Dennis Murphy wrote: >>>> http://cran.r-project.org/web/views/ >>>> Look for 'machine learning'. >>>> Dennis >>>> On Sun, Sep 11, 2011 at 11:33 PM, Jay wrote: >>>>> If the answer is so obvious, could somebody please spell it out? >>>>> On Sep 11, 10:59 pm, Jason Edgecombe wrote: >>>>>> Try this: >>>>>> http://cran.r-project.org/web/views/MachineLearning.html >>>>>> On 09/11/2011 12:43 PM, Jay wrote: >>>>>>> Hi, >>>>>>> I used the rseek search engine to look for suitable solutions, however >>>>>>> as I was unable to find anything useful, I'm asking for help. >>>>>>> Anybody have experience with these kinds of problems? I looked into >>>>>>> dynaTree, but as information is a bit scares and as I understand it, >>>>>>> it might not be what I'm looking for..(?) >>>>>>> BR, >>>>>>> Jay >>>>>>> On Sep 11, 7:15 pm, David Winsemius wrote: >>>>>>>> On Sep 11, 2011, at 11:42 AM, Jay wrote: >>>>>>>>> What R packages are available for performing classification tasks? >>>>>>>>> That is, when the predictor has done its job on the dataset (based on >>>>>>>>> the training set and a range of variables), feedback about the true >>>>>>>>> label will be available and this information should be integrated for >>>>>>>>> the next classification round. >>>>>>>> You should look at CRAN Task Views. Extremely easy to find from the >>>>>>>> main R-project page. >>>>>>>> -- >>>>>>>> David Winsemius, MD >>>>>>>> West Hartford, CT >>>>>>>> ______________________________________________ >>>>>>>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> ______________________________________________ >>>>>>> R-h... at r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> ______________________________________________ >>>>>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> ______________________________________________ >>>>> R-h... at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> ______________________________________________ >>>> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> ______________________________________________ >>> R-h... at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From mailinglist.honeypot at gmail.com Sun Oct 2 20:16:19 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sun, 2 Oct 2011 14:16:19 -0400 Subject: [R] On-line machine learning packages? In-Reply-To: References: <916c0c5c-ef7d-4a1e-9665-cc9ab5b83bd4@l28g2000yqh.googlegroups.com> <4E6D1334.5010506@rampaginggeek.com> <58b5edc4-234b-42e6-aca8-3fe7e881b5b7@g31g2000yqh.googlegroups.com> Message-ID: Hi Jay, I see this thread is a bit (ok, quite) old at this point, but I see you never really got an answer to your question that was satisfactory. I figured you might be interested to know that Dirk has started to wrap vowpal wabbit[1,2] into an R package, RVowpalWabbit[3,4] The package itself is still a rather bare-bones, but perhaps it can be useful to you in its current state, or perhaps the "raw" vowpal wabbit. You might also consider the shogun toolbox[5]. As of its 1.0 release, I believe it has incorporated vowpal wabbit in some form or another to do online learning, but might have other online learning algos baked in. It has its own flavor of an R interface (r_static or r_modular), which might work for you if you can get it to compile. -steve [1] Vowpal Wabbit (home page): http://hunch.net/~vw/ [2] Vowpal Wabbit (github): https://github.com/JohnLangford/vowpal_wabbit [3] RVowpalWabbit (CRAN): http://cran.r-project.org/web/packages/RVowpalWabbit/index.html [4] RVowpalWabbit (R-forge): https://r-forge.r-project.org/projects/rvowpalwabbit/ [5] The shogun toolbox: http://www.shogun-toolbox.org/ On Mon, Sep 12, 2011 at 5:24 AM, Jay wrote: > In my mind this sequential classification task with feedback is > somewhat different from an completely offline, once-off, > classification. Am I wrong? > However, it looks like the mentality on this topic is to refer me to > cran/google in order to look for solutions myself. Oblivious I know > about these sources, and as I said, I used rseek.org among other > sources to look for solutions. I did not start this topic for fun, I'm > asking for help to find a suitable machine learning packages that > readily incorporates feedback loops and online learning. If somebody > has experience these kinds of problems in R, please respond. > > > Or will > "http://cran.r-project.org > Look for 'Task Views'" > be my next piece of advice? > > On Sep 12, 11:31?am, Dennis Murphy wrote: >> http://cran.r-project.org/web/views/ >> >> Look for 'machine learning'. >> >> Dennis >> >> >> >> On Sun, Sep 11, 2011 at 11:33 PM, Jay wrote: >> > If the answer is so obvious, could somebody please spell it out? >> >> > On Sep 11, 10:59?pm, Jason Edgecombe wrote: >> >> Try this: >> >> >>http://cran.r-project.org/web/views/MachineLearning.html >> >> >> On 09/11/2011 12:43 PM, Jay wrote: >> >> >> > Hi, >> >> >> > I used the rseek search engine to look for suitable solutions, however >> >> > as I was unable to find anything useful, I'm asking for help. >> >> > Anybody have experience with these kinds of problems? I looked into >> >> > dynaTree, but as information is a bit scares and as I understand it, >> >> > it might not be what I'm looking for..(?) >> >> >> > BR, >> >> > Jay >> >> >> > On Sep 11, 7:15 pm, David Winsemius ?wrote: >> >> >> On Sep 11, 2011, at 11:42 AM, Jay wrote: >> >> >> >>> What R packages are available for performing classification tasks? >> >> >>> That is, when the predictor has done its job on the dataset (based on >> >> >>> the training set and a range of variables), feedback about the true >> >> >>> label will be available and this information should be integrated for >> >> >>> the next classification round. >> >> >> You should look at CRAN Task Views. Extremely easy to find from the >> >> >> main R-project page. >> >> >> >> -- >> >> >> David Winsemius, MD >> >> >> West Hartford, CT >> >> >> >> ______________________________________________ >> >> >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ >> >> > R-h... at r-project.org mailing list >> >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> ______________________________________________ >> >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ >> > R-h... at r-project.org mailing list >> >https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From Peter.Brecknock at bp.com Sun Oct 2 20:18:43 2011 From: Peter.Brecknock at bp.com (Pete Brecknock) Date: Sun, 2 Oct 2011 11:18:43 -0700 (PDT) Subject: [R] Keep ALL duplicate records In-Reply-To: <1317564057860-3865136.post@n4.nabble.com> References: <1317564057860-3865136.post@n4.nabble.com> Message-ID: <1317579523767-3865573.post@n4.nabble.com> Erik Svensson wrote: > > Hello, > In a data frame I want to identify ALL duplicate IDs in the example to be > able to examine "OS" and "time". > > (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), > OS=c("Win","OSX","Win", "Win64"), > time=c("12:22","23:22","04:44","12:28"))) > > ID OS time > 1 userA Win 12:22 > 2 userB OSX 23:22 > 3 userA Win 04:44 > 4 userC Win64 12:28 > > My desired output is that ALL records with the same IDs are found: > > userA Win 12:22 > userA Win 04:44 > > preferably by returning logical values (TRUE FALSE TRUE FALSE) > > Is there a simple way to do that? > > [-- With duplicated(df$ID) the output will be > [1] FALSE FALSE TRUE FALSE > i.e. not all user A records are found > > With unique(df$ID) > [1] userA userB userC > Levels: userA userB userC > i.e. one of each ID is found --] > > Erik Svensson > How about ... # All records ALL_RECORDS <- df[df$ID==df$ID[duplicated(df$ID)],] print(ALL_RECORDS) # Logical Records TRUE_FALSE <- df$ID==df$ID[duplicated(df$ID)] print(TRUE_FALSE) HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html Sent from the R help mailing list archive at Nabble.com. From bby2103 at columbia.edu Sun Oct 2 20:47:56 2011 From: bby2103 at columbia.edu (bby2103 at columbia.edu) Date: Sun, 02 Oct 2011 14:47:56 -0400 Subject: [R] difference between createPartition and createfold functions Message-ID: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> Hello, I'm trying to separate my dataset into 4 parts with the 4th one as the test dataset, and the other three to fit a model. I've been searching for the difference between these 2 functions in Caret package, but the most I can get is this-- A series of test/training partitions are created using createDataPartition while createResample creates one or more bootstrap samples. createFolds splits the data into k groups. I'm missing something here? What is the difference btw createPartition and createFold? I guess they wouldn't be equivalent. Thank you. Bonnie Yuan From tlumley at uw.edu Sun Oct 2 21:12:18 2011 From: tlumley at uw.edu (Thomas Lumley) Date: Mon, 3 Oct 2011 08:12:18 +1300 Subject: [R] Is the output of survfit.coxph survival or baseline survival? In-Reply-To: <1317432668597-3861919.post@n4.nabble.com> References: <1317432668597-3861919.post@n4.nabble.com> Message-ID: On Sat, Oct 1, 2011 at 2:31 PM, koshihaku wrote: > Dear all, > I am confused with the output of survfit.coxph. > Someone said that the survival given by summary(survfit.coxph) is the > baseline survival S_0, but some said that is the survival S=S_0^exp{beta*x}. > > Which one is correct? The baseline hazard as estimated in survfit.coxph is the hazard when all covariates are equal to the sample mean (or the stratum mean for a stratified model). The means that it is using are available in the $means component of the coxph object. It is not the hazard extrapolated to all covariates equal zero. The centering at the sample mean is done for three reasons 1/ it's computationally convenient 2/ it's numerically more stable 3/ it makes the baseline hazard more interpretable, since at least it is the hazard for a set of covariate values somewhere in the interior of your data. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland From tlumley at uw.edu Sun Oct 2 21:15:55 2011 From: tlumley at uw.edu (Thomas Lumley) Date: Mon, 3 Oct 2011 08:15:55 +1300 Subject: [R] Advice on approach to weighting survey In-Reply-To: <8452CFD6AC58614FA9F87C8ADC2E41890D927D79D7@exchange01.lacmta.net> References: <8452CFD6AC58614FA9F87C8ADC2E41890D927D79D7@exchange01.lacmta.net> Message-ID: On Sat, Oct 1, 2011 at 4:59 AM, Farley, Robert wrote: > I'm about to add weights to a bus on-board survey dataset with ~150 variables and ~28,000 records. ?My intention is to weight (for each bus "run") by boarding stop and alighting stop. ?I've seen the Rake function of the Survey package, but it seems that converting to a "svydesign" might be excessive for my purpose. > > My dataset has a huge number of unique "Run-Boarding" and "Run-Alighting" groups each with a small number of records to expand. ?Would it be easier to manually implement Iterative-Proportional-Fitting/Raking/Fratar/Furness on the data? ?Or are there benefits to converting the data to a svydesign that would make it valuable? ?This "traditional" weighting expands what we call unlinked (based on each boarding)trips. ?I'm thinking of also using IPF/Raking to estimate linked (based on each individual) trips. ?Would this change the consideration of using the svydesign process? > If you're planning to do any analysis afterwards it would be useful to have the data in a svydesign object, or if you end up needing to do weight trimming or bounding, or other slightly more complicated weight adjustments. Otherwise it might well just be easier to do your own IPF algorithm. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland From mailinglist.honeypot at gmail.com Sun Oct 2 21:21:26 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sun, 2 Oct 2011 15:21:26 -0400 Subject: [R] difference between createPartition and createfold functions In-Reply-To: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> References: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> Message-ID: Hi, On Sun, Oct 2, 2011 at 2:47 PM, wrote: > Hello, > > I'm trying to separate my dataset into 4 parts with the 4th one as the test > dataset, and the other three to fit a model. > > I've been searching for the difference between these 2 functions in Caret > package, but the most I can get is this-- > > A series of test/training partitions are created using createDataPartition > while createResample creates one or more bootstrap samples. createFolds > splits the data into k groups. > > I'm missing something here? What is the difference btw createPartition and > createFold? I guess they wouldn't be equivalent. Well -- you could always look at the source code to find out (enter the name of the function into your R console and hit return), but you can also do some experimentation to find out. Using the data from the Examples section of caret::createFolds: R> library(caret) R> data(oil) R> part <- createDataPartition(oilType, 2) R> fold <- createFolds(oilType, 2) R> length(Reduce(intersect, part)) [1] 27 R> length(Reduce(intersect, fold)) [1] 0 Looks like `createDataPartition` split your data into smaller pieces, but allows for the same example to appear in different splits. `createFolds` doesn't allow different examples to appear in different splits of the folds. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From kbrownk at gmail.com Sun Oct 2 19:11:22 2011 From: kbrownk at gmail.com (Kerry) Date: Sun, 2 Oct 2011 10:11:22 -0700 (PDT) Subject: [R] Scatterplot with the 3rd dimension = color? Message-ID: I have 3 columns of data and want to plot each row as a point in a scatter plot and want one column to be represented as a color gradient (e.g. larger values being more red). Anyone know the command or package for this? Thanks, KB From alaios at yahoo.com Sun Oct 2 21:28:37 2011 From: alaios at yahoo.com (Alaios) Date: Sun, 2 Oct 2011 12:28:37 -0700 (PDT) Subject: [R] is member In-Reply-To: References: <1317399946.71678.YahooMailNeo@web120101.mail.ne1.yahoo.com> <1317416803.47635.YahooMailNeo@web120120.mail.ne1.yahoo.com> Message-ID: <1317583717.23790.YahooMailNeo@web120108.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tal.galili at gmail.com Sun Oct 2 21:46:15 2011 From: tal.galili at gmail.com (Tal Galili) Date: Sun, 2 Oct 2011 21:46:15 +0200 Subject: [R] Scatterplot with the 3rd dimension = color? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bby2103 at columbia.edu Sun Oct 2 21:54:20 2011 From: bby2103 at columbia.edu (bby2103 at columbia.edu) Date: Sun, 02 Oct 2011 15:54:20 -0400 Subject: [R] difference between createPartition and createfold functions In-Reply-To: References: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> Message-ID: <20111002155420.wpg4myszy80s8csc@cubmail.cc.columbia.edu> Hi Steve, Thanks for the note. I did try the example and the result didn't make sense to me. For splitting a vector, what you describe is a big difference btw them. For splitting a dataframe, I now wonder if these 2 functions are the wrong choices. They seem to split the columns, at least in the few things I tried. Bonnie Quoting Steve Lianoglou : > Hi, > > On Sun, Oct 2, 2011 at 2:47 PM, wrote: >> Hello, >> >> I'm trying to separate my dataset into 4 parts with the 4th one as the test >> dataset, and the other three to fit a model. >> >> I've been searching for the difference between these 2 functions in Caret >> package, but the most I can get is this-- >> >> A series of test/training partitions are created using createDataPartition >> while createResample creates one or more bootstrap samples. createFolds >> splits the data into k groups. >> >> I'm missing something here? What is the difference btw createPartition and >> createFold? I guess they wouldn't be equivalent. > > Well -- you could always look at the source code to find out (enter > the name of the function into your R console and hit return), but you > can also do some experimentation to find out. Using the data from the > Examples section of caret::createFolds: > > R> library(caret) > R> data(oil) > R> part <- createDataPartition(oilType, 2) > R> fold <- createFolds(oilType, 2) > > R> length(Reduce(intersect, part)) > [1] 27 > > R> length(Reduce(intersect, fold)) > [1] 0 > > Looks like `createDataPartition` split your data into smaller pieces, > but allows for the same example to appear in different splits. > > `createFolds` doesn't allow different examples to appear in different > splits of the folds. > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > From murdoch.duncan at gmail.com Sun Oct 2 21:55:39 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Sun, 02 Oct 2011 15:55:39 -0400 Subject: [R] Scatterplot with the 3rd dimension = color? In-Reply-To: References: Message-ID: <4E88C1BB.2030106@gmail.com> On 11-10-02 1:11 PM, Kerry wrote: > I have 3 columns of data and want to plot each row as a point in a > scatter plot and want one column to be represented as a color gradient > (e.g. larger values being more red). Anyone know the command or > package for this? It's not a particularly effective display, but here's how to do it. Use rainbow(101) in place of rev(heat.colors(101)) if you like. x <- rnorm(10) y <- rnorm(10) z <- rnorm(10) colors <- rev(heat.colors(101)) zcolor <- colors[(z - min(z))/diff(range(z))*100 + 1] plot(x,y,col=zcolor) Duncan Murdoch From tal.galili at gmail.com Sun Oct 2 21:43:18 2011 From: tal.galili at gmail.com (Tal Galili) Date: Sun, 2 Oct 2011 21:43:18 +0200 Subject: [R] R Studio and Rcmdr/RcmdrPlugins In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mailinglist.honeypot at gmail.com Sun Oct 2 22:00:58 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sun, 2 Oct 2011 16:00:58 -0400 Subject: [R] difference between createPartition and createfold functions In-Reply-To: <20111002155420.wpg4myszy80s8csc@cubmail.cc.columbia.edu> References: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> <20111002155420.wpg4myszy80s8csc@cubmail.cc.columbia.edu> Message-ID: Hi, On Sun, Oct 2, 2011 at 3:54 PM, wrote: > Hi Steve, > > Thanks for the note. I did try the example and the result didn't make sense > to me. For splitting a vector, what you describe is a big difference btw > them. For splitting a dataframe, I now wonder if these 2 functions are the > wrong choices. They seem to split the columns, at least in the few things I > tried. Sorry, I'm a bit confused now as to what you are after. You don't pass in a data.frame into any of the createFolds/DataPartition functions from the caret package. You pass in a *vector* of labels, and these functions tells you which indices into the vector to use as examples to hold out (or keep (depending on the value you pass in for the `returnTrain` argument)) between each fold/partition of your learning scenario (eg. cross validation with createFolds). You would then use these indices to keep (remove) the rows of a data.frame, if that is how you are storing your examples. Does that make sense? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From crabak at acm.org Sun Oct 2 23:47:10 2011 From: crabak at acm.org (csrabak) Date: Sun, 2 Oct 2011 18:47:10 -0300 Subject: [R] error while using shapiro.test() In-Reply-To: <1317486293251-3863205.post@n4.nabble.com> References: <1317423562223-3861535.post@n4.nabble.com> <1317486293251-3863205.post@n4.nabble.com> Message-ID: Em 1/10/2011 13:24, spicymchaggis101 escreveu: > Thank you very much! your response solved my issue. > > I needed to determine the probability of normality for word types per page. > You need to insure this assumption is reasonable for your problem domain as words types per page seems like count data for me and for this kind of data Gaussian distributions are at the very best last resort approximations. -- Cesar Rabak From jholtman at gmail.com Mon Oct 3 00:26:36 2011 From: jholtman at gmail.com (jim holtman) Date: Sun, 2 Oct 2011 18:26:36 -0400 Subject: [R] Keep ALL duplicate records In-Reply-To: <1317579523767-3865573.post@n4.nabble.com> References: <1317564057860-3865136.post@n4.nabble.com> <1317579523767-3865573.post@n4.nabble.com> Message-ID: Here is a function I use to find all duplicate records > allDup <- function (value) { duplicated(value) | duplicated(value, fromLast = TRUE) } > x ID OS time 1 userA Win 12:22 2 userB OSX 23:22 3 userA Win 04:44 4 userC Win64 12:28 > x[allDup(x$ID),] ID OS time 1 userA Win 12:22 3 userA Win 04:44 > On Sun, Oct 2, 2011 at 2:18 PM, Pete Brecknock wrote: > > Erik Svensson wrote: >> >> Hello, >> In a data frame I want to identify ALL duplicate IDs in the example to be >> able to examine "OS" and "time". >> >> (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), >> ? OS=c("Win","OSX","Win", "Win64"), >> ? time=c("12:22","23:22","04:44","12:28"))) >> >> ? ? ?ID ? ?OS ?time >> 1 userA ? Win 12:22 >> 2 userB ? OSX 23:22 >> 3 userA ? Win 04:44 >> 4 userC Win64 12:28 >> >> My desired output is that ALL records with the same IDs are found: >> >> userA ? Win 12:22 >> userA ? Win 04:44 >> >> preferably by returning logical values (TRUE FALSE TRUE FALSE) >> >> Is there a simple way to do that? >> >> [-- With duplicated(df$ID) the output will be >> [1] FALSE FALSE ?TRUE FALSE >> i.e. not all user A records are found >> >> With unique(df$ID) >> [1] userA userB userC >> Levels: userA userB userC >> i.e. one of each ID is found --] >> >> Erik Svensson >> > > > How about ... > > # All records > ALL_RECORDS <- df[df$ID==df$ID[duplicated(df$ID)],] > print(ALL_RECORDS) > > # Logical Records > TRUE_FALSE <- df$ID==df$ID[duplicated(df$ID)] > print(TRUE_FALSE) > > HTH > > Pete > > > -- > View this message in context: http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3865573.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From ggrothendieck at gmail.com Mon Oct 3 00:47:29 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Sun, 2 Oct 2011 18:47:29 -0400 Subject: [R] Find all duplicate records In-Reply-To: <1317564311771-3865139.post@n4.nabble.com> References: <1317564311771-3865139.post@n4.nabble.com> Message-ID: On Sun, Oct 2, 2011 at 10:05 AM, Erik Svensson wrote: > Hello, > In a data frame I want to identify ALL duplicate IDs in the example to be > able to examine "OS" and "time". > > (df<-data.frame(ID=c("userA", "userB", "userA", "userC"), > ?OS=c("Win","OSX","Win", "Win64"), > ?time=c("12:22","23:22","04:44","12:28"))) > > ? ? ID ? ?OS ?time > 1 userA ? Win 12:22 > 2 userB ? OSX 23:22 > 3 userA ? Win 04:44 > 4 userC Win64 12:28 > > My desired output is that ALL records with the same IDs are found: > > userA ? Win 12:22 > userA ? Win 04:44 > > preferably by returning logical values (TRUE FALSE TRUE FALSE) > Try this: > ave(rownames(df), df$ID, FUN = length) > 1 [1] TRUE FALSE TRUE FALSE -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From mxkuhn at gmail.com Mon Oct 3 02:45:40 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Sun, 2 Oct 2011 20:45:40 -0400 Subject: [R] difference between createPartition and createfold functions In-Reply-To: References: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> <20111002155420.wpg4myszy80s8csc@cubmail.cc.columbia.edu> Message-ID: Basically, createDataPartition is used when you need to make one or more simple two-way splits of your data. For example, if you want to make a training and test set and keep your classes balanced, this is what you could use. It can also make multiple splits of this kind (or leave-group-out CV aka Monte Carlos CV aka repeated training test splits). createFolds is exclusively for k-fold CV. Their usage is simular when you use the returnTrain = TRUE option in createFolds. Max On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou wrote: > Hi, > > On Sun, Oct 2, 2011 at 3:54 PM, ? wrote: >> Hi Steve, >> >> Thanks for the note. I did try the example and the result didn't make sense >> to me. For splitting a vector, what you describe is a big difference btw >> them. For splitting a dataframe, I now wonder if these 2 functions are the >> wrong choices. They seem to split the columns, at least in the few things I >> tried. > > Sorry, I'm a bit confused now as to what you are after. > > You don't pass in a data.frame into any of the > createFolds/DataPartition functions from the caret package. > > You pass in a *vector* of labels, and these functions tells you which > indices into the vector to use as examples to hold out (or keep > (depending on the value you pass in for the `returnTrain` argument)) > between each fold/partition of your learning scenario (eg. cross > validation with createFolds). > > You would then use these indices to keep (remove) the rows of a > data.frame, if that is how you are storing your examples. > > Does that make sense? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From therneau at mayo.edu Mon Oct 3 04:06:12 2011 From: therneau at mayo.edu (Terry Therneau) Date: Sun, 02 Oct 2011 21:06:12 -0500 Subject: [R] Is the output of survfit.coxph survival or baseline survival? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From venerealdisease2011 at gmail.com Mon Oct 3 03:29:11 2011 From: venerealdisease2011 at gmail.com (venerealdisease) Date: Sun, 2 Oct 2011 18:29:11 -0700 (PDT) Subject: [R] about the array transpose Message-ID: <1317605351283-3866241.post@n4.nabble.com> Hi, all, I am a newbie for [R] Would anyone help me how to transpose a 3x3x3 array for 1:27 Eg. A<-array(1:27, c(3,3,3) What is the logic to transpose it to B<-aperm(A, c(3,2,1)) Because I found I could not imagine how it transposes, anyone could solve my problem? And most important I could get the number what I expected, I think if I could not figure it out, I will have a confused concept which will affect my future learning of 3D models in [R]. Highly appreciated and thanks. VD -- View this message in context: http://r.789695.n4.nabble.com/about-the-array-transpose-tp3866241p3866241.html Sent from the R help mailing list archive at Nabble.com. From nadine.melhem at gmail.com Sun Oct 2 22:00:39 2011 From: nadine.melhem at gmail.com (Nadine Melhem) Date: Sun, 2 Oct 2011 16:00:39 -0400 Subject: [R] patients.txt data Message-ID: <72BB5381-94DA-4722-8F25-4FFD08443248@gmail.com> please send me the "patients.txt" data. thanks. From melhnm at UPMC.EDU Sun Oct 2 22:31:21 2011 From: melhnm at UPMC.EDU (Melhem, Nadine) Date: Sun, 2 Oct 2011 16:31:21 -0400 Subject: [R] patients.txt data Message-ID: <92F65CB970AB31498965691837E1BF5B0191C300AD@MSXMBXNSPRD19.acct.upmchs.net> I'm new to learning R. I'm taking a course and will need access to the "patients.txt" data to be able to do the exercises required using this dataset. thanks. From kbrownk at gmail.com Sun Oct 2 23:12:33 2011 From: kbrownk at gmail.com (Kerry) Date: Sun, 2 Oct 2011 14:12:33 -0700 (PDT) Subject: [R] Scatterplot with the 3rd dimension = color? In-Reply-To: <4E88C1BB.2030106@gmail.com> References: <4E88C1BB.2030106@gmail.com> Message-ID: Yes, perfect! This I can work with. Thanks, KB On Oct 2, 3:55?pm, Duncan Murdoch wrote: > On 11-10-02 1:11 PM, Kerry wrote: > > > I have 3 columns of data and want to plot each row as a point in a > > scatter plot and want one column to be represented as a color gradient > > (e.g. larger ?values being more red). Anyone know the command or > > package for this? > > It's not a particularly effective display, but here's how to do it. ?Use > rainbow(101) in place of rev(heat.colors(101)) if you like. > > x <- rnorm(10) > y <- rnorm(10) > z <- rnorm(10) > colors <- rev(heat.colors(101)) > zcolor <- colors[(z - min(z))/diff(range(z))*100 + 1] > plot(x,y,col=zcolor) > > Duncan Murdoch > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bbolker at gmail.com Mon Oct 3 04:42:03 2011 From: bbolker at gmail.com (Ben Bolker) Date: Mon, 3 Oct 2011 02:42:03 +0000 Subject: [R] Scatterplot with the 3rd dimension = color? References: <4E88C1BB.2030106@gmail.com> Message-ID: Duncan Murdoch gmail.com> writes: > > On 11-10-02 1:11 PM, Kerry wrote: > > I have 3 columns of data and want to plot each row as a point in a > > scatter plot and want one column to be represented as a color gradient > > (e.g. larger values being more red). Anyone know the command or > > package for this? > > It's not a particularly effective display, but here's how to do it. Use > rainbow(101) in place of rev(heat.colors(101)) if you like. > > x <- rnorm(10) > y <- rnorm(10) > z <- rnorm(10) > colors <- rev(heat.colors(101)) > zcolor <- colors[(z - min(z))/diff(range(z))*100 + 1] > plot(x,y,col=zcolor) > or d <- data.frame(x,y,z) library(ggplot2) qplot(x,y,colour=z,data=d) I agree about the "not particularly effective display" comment, but if you have two continuous predictors and a continuous response you've got a tough display problem -- your choices are: 1. use color, size, or some other graphical characteristic (pretty far down on the "Cleveland hierarchy") 2. use a perspective plot (hard to get the right viewing angle, often confusing) 3. use coplots/small multiples/faceting (requires discretizing one dimension) From xenon99 at hotmail.com Mon Oct 3 05:41:21 2011 From: xenon99 at hotmail.com (Darius H) Date: Mon, 3 Oct 2011 03:41:21 +0000 Subject: [R] rolling regression Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mailinglist.honeypot at gmail.com Mon Oct 3 05:50:26 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sun, 2 Oct 2011 23:50:26 -0400 Subject: [R] patients.txt data In-Reply-To: <92F65CB970AB31498965691837E1BF5B0191C300AD@MSXMBXNSPRD19.acct.upmchs.net> References: <92F65CB970AB31498965691837E1BF5B0191C300AD@MSXMBXNSPRD19.acct.upmchs.net> Message-ID: Hi, On Sun, Oct 2, 2011 at 4:31 PM, Melhem, Nadine wrote: > I'm new to learning R. I'm taking a course and will need access to the "patients.txt" data to be able to do the exercises required using this dataset. Without more context, I'm doubtful that anybody will be able to help you. I reckon your best bet will be to ask your instructor where you can find this sample data. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From nevil.amos at gmail.com Mon Oct 3 06:49:49 2011 From: nevil.amos at gmail.com (Nevil Amos) Date: Mon, 03 Oct 2011 15:49:49 +1100 Subject: [R] How to format R superscript 2 followed by "=" value Message-ID: <4E893EED.6050804@monash.edu> I am trying to put an R2 value with R2 formatted with a superscript 2 followed by "=" and the value : the first mtext prints the R2 correctly formatted but follows it with "=round(summary(mylm)$r.squared,3)))" as text the second prints "R^2 =" followed by the value of round(summary(mylm)$r.squared,3))). how do I correctly write the expression to get formatted r2 followed by the value? x=runif(10) y=runif(10) summary(mylm<-lm(y~x)) plot(x,y) abline(mylm) mtext(expression(paste(R^2,"=",round(summary(mylm)$r.squared,3))),1) mtext(paste(expression(R^2),"=",round(summary(mylm)$r.squared,3)),3) thanks Nevil Amos From larapoplarski at gmail.com Mon Oct 3 07:16:30 2011 From: larapoplarski at gmail.com (Lara Poplarski) Date: Sun, 2 Oct 2011 22:16:30 -0700 Subject: [R] function recode within sapply Message-ID: Dear List, I am using function recode, from package car, within sapply, as follows: L3 <- LETTERS[1:3] (d <- data.frame(cbind(x = 1, y = 1:10), fac1 = sample(L3, 10, replace=TRUE), fac2 = sample(L3, 10, replace=TRUE), fac3 = sample(L3, 10, replace=TRUE))) str(d) d[, c("fac1", "fac2")] <- sapply(d[, c("fac1", "fac2")], recode, "c('A', 'B') = 'XX'", as.factor.result = TRUE) d[, "fac3"] <- recode(d[, "fac3"], "c('A', 'B') = 'XX'") str(d) However, the class of columns fac1 and fac2 is "character" as opposed to "factor", even though I specify the option "as.factor.result = TRUE"; this option works fine with a single column. Any thoughts? Many thanks, Lara From jwiley.psych at gmail.com Mon Oct 3 07:39:19 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sun, 2 Oct 2011 22:39:19 -0700 Subject: [R] How to format R superscript 2 followed by "=" value In-Reply-To: <4E893EED.6050804@monash.edu> References: <4E893EED.6050804@monash.edu> Message-ID: Hi Nevil, Here is one option: ################################ ## function definition r2format <- function(object, digits = 3, output, sub, expression = TRUE, ...) { if (inherits(object, "lm")) { x <- summary(object) } else if (inherits(object, "summary.lm")) { x <- object } else stop("object is an unmanageable class") out <- format(x$r.squared, digits = digits) if (!missing(output)) { output <- gsub(sub, out, output) } else { output <- out } if (expression) { output <- parse(text = output) } return(output) } ## model m <- lm(mpg ~ hp * wt, data = mtcars) ## demonstration r2format(object = m, output = "R^2 == rval", sub = "rval", expression = TRUE) ## your problem x <- runif(10) y <- runif(10) mylm <- lm(y ~ x) plot(x, y) abline(mylm) ## simplified version of demo mtext(r2format(m, 3, "R^2 == rval", "rval"), 3) ################################ The real key is using == instead of "=". The lengthy response is because I have been toying with and working with different stylers and formatters to try to facilitate getting output from R into publication format so I was interested in playing with this and thinking what might be useful abstractions. Anyway, more specific to your useage might be something like: substitute(expression(R^2 == rval), list(rval = round(summary(mylm)$r.squared,3))) Cheers, Josh On Sun, Oct 2, 2011 at 9:49 PM, Nevil Amos wrote: > I am trying to put ?an > R2 value with R2 formatted with a superscript 2 followed by "=" and the > value : > the first mtext prints the R2 correctly formatted but follows it with > "=round(summary(mylm)$r.squared,3)))" as text > the second prints "R^2 =" followed by the value of > round(summary(mylm)$r.squared,3))). > > how do I correctly write the expression to get formatted r2 followed by the > value? > > > > > x=runif(10) > y=runif(10) > summary(mylm<-lm(y~x)) > plot(x,y) > abline(mylm) > mtext(expression(paste(R^2,"=",round(summary(mylm)$r.squared,3))),1) > mtext(paste(expression(R^2),"=",round(summary(mylm)$r.squared,3)),3) > > > > thanks > > Nevil Amos > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From jwiley.psych at gmail.com Mon Oct 3 07:45:58 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sun, 2 Oct 2011 22:45:58 -0700 Subject: [R] function recode within sapply In-Reply-To: References: Message-ID: Hi Lara, Use lapply here instead of sapply or specify simplify = FALSE. See ?sapply for details. d[, c("fac1", "fac2")] <- lapply(d[, c("fac1", "fac2")], recode, "c('A', 'B') = 'XX'", as.factor.result = TRUE) d[, "fac3"] <- recode(d[, "fac3"], "c('A', 'B') = 'XX'") str(d) Cheers, Josh On Sun, Oct 2, 2011 at 10:16 PM, Lara Poplarski wrote: > Dear List, > > I am using function recode, from package car, within sapply, as follows: > > L3 <- LETTERS[1:3] > (d <- data.frame(cbind(x = 1, y = 1:10), fac1 = sample(L3, 10, > replace=TRUE), fac2 = sample(L3, 10, replace=TRUE), fac3 = sample(L3, > 10, replace=TRUE))) > str(d) > > d[, c("fac1", "fac2")] <- sapply(d[, c("fac1", "fac2")], recode, > "c('A', 'B') = 'XX'", as.factor.result = TRUE) > d[, "fac3"] <- recode(d[, "fac3"], "c('A', 'B') = 'XX'") > str(d) > > However, the class of columns fac1 and fac2 is "character" as opposed > to "factor", even though I specify the option "as.factor.result = > TRUE"; this option works fine with a single column. > > Any thoughts? > > Many thanks, > Lara > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From datashaping at gmail.com Mon Oct 3 05:52:53 2011 From: datashaping at gmail.com (dataguru) Date: Sun, 2 Oct 2011 20:52:53 -0700 (PDT) Subject: [R] New random number generator (RNG) Message-ID: <20293a43-2717-4e49-9b31-7f909fffa35b@i23g2000yqm.googlegroups.com> Based on very fast converging series for special transcendental numbers. Is there some R code available? See details about the RNG at http://www.analyticbridge.com/profiles/blogs/new-state-of-the-art-random-number-generator-simple-strong-and-fa From nevil.amos at gmail.com Mon Oct 3 06:02:28 2011 From: nevil.amos at gmail.com (Nevil Amos) Date: Mon, 03 Oct 2011 15:02:28 +1100 Subject: [R] How to format Rsuperscript 2 followed by = value Message-ID: <4E8933D4.9050707@monash.edu> I am trying to put an R2 value with R2 formatted with a superscript 2 followed by "=" and the value : the first mtext prints the R2 correctly formatted but follows it with "=round(summary(mylm)$r.squared,3)))" as text the second prints "R^2 =" followed by the value of round(summary(mylm)$r.squared,3))). how do I correctly write the expression to get formatted r2 followed by the value? x=runif(10) y=runif(10) summary(mylm<-lm(y~x)) plot(x,y) abline(mylm) mtext(expression(paste(R^2,"=",round(summary(mylm)$r.squared,3))),1) mtext(paste(expression(R^2),"=",round(summary(mylm)$r.squared,3)),3) thanks Nevil Amos From nfaux at unimelb.edu.au Mon Oct 3 06:22:47 2011 From: nfaux at unimelb.edu.au (Noel Faux) Date: Mon, 3 Oct 2011 15:22:47 +1100 Subject: [R] Unable to load local library via GUI Message-ID: <3502B28D-274D-4A72-A49E-27D91AEE1698@unimelb.edu.au> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From John.Morrongiello at csiro.au Mon Oct 3 07:33:59 2011 From: John.Morrongiello at csiro.au (John.Morrongiello at csiro.au) Date: Mon, 3 Oct 2011 16:33:59 +1100 Subject: [R] new standardised variable based on group membership Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From K.Soetaert at nioo.knaw.nl Mon Oct 3 09:18:10 2011 From: K.Soetaert at nioo.knaw.nl (Soetaert, Karline) Date: Mon, 3 Oct 2011 09:18:10 +0200 Subject: [R] deSolve - Function daspk on DAE system - Error (Vince) Message-ID: <65F6E1EC64DCA6489800C09A2007FC6E04241AA9@cememail1.nioo.int> Vince, When that happens, one possible reason is that your DAE is of index > 1, which cannot be solved by daspk. The solver radau, also from deSolve can handle DAEs up to index 3, but you need to rewrite the problem in the form M*y' = f(x,y), where M is a mass matrix. If you do that for your problem, and solve it with radau, then radau complains that the "matrix is repeatedly singular", the problem is too stiff, and it stops just like daspk. I think this means that this particular DAE is unsolvable, so you will need to look at the formulation itself. By the way, there is a special R-mailinglist that deals with this type of problems: r-sig-dynamic-models at r-project.org Hope this helps, Karline --------------------- Original message: Date: Sat, 1 Oct 2011 20:20:10 -0700 (PDT) From: Vince To: r-help at r-project.org Subject: [R] deSolve - Function daspk on DAE system - Error Message-ID: <1317525610060-3864298.post at n4.nabble.com> Content-Type: text/plain; charset=us-ascii I'm getting this error on the attached code and breaking my head but can't figure it out. Any help is much appreciated. Thanks, Vince CODE: library(deSolve) Res_DAE=function(t, y, dy, pars) { with(as.list(c(y, dy, pars)), { res1 = -dS -dES-k2*ES res2 = -dP + k2*ES eq1 = Eo-E -ES eq2 = So-S -ES -P return(list(c(res1, res2, eq1, eq2))) }) } pars <- c(Eo=0.02, So=0.02, k2=250, E=0.01); pars yini <- c(S=0.01, ES = 0.01, P=0.0, E=0.01); yini times <- seq(0, 0.01, by = 0.0001); times dyini = c(dS=0.0, dES=0.0, dP=0.0) ## Tabular output check of matrix output DAE <- daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, atol = 1e-10, rtol = 1e-10) ERROR: daspk-- warning.. At T(=R1) and stepsize H (=R2) the nonlinear solver f nonlinear solver failed to converge repeatedly of with abs (H) = H repeatedly of with abs (H) = HMIN preconditioner had repeated failur 0.0000000000000D+00 0.5960464477539D-14 Warning messages: 1: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, : repeated convergence test failures on a step - inaccurate Jacobian or preconditioner? 2: In daspk(y = yini, dy = dyini, times = times, res = Res_DAE, parms = pars, : Returning early. Results are accurate, as far as they go From Thierry.ONKELINX at inbo.be Mon Oct 3 09:35:35 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Mon, 3 Oct 2011 07:35:35 +0000 Subject: [R] new standardised variable based on group membership In-Reply-To: References: Message-ID: Dear John, You need to combine scale with a grouping function. data(Orange) library(plyr) Orange <- ddply(Orange, .(Tree), function(x){ x$ddplyAge <- scale(x$age)[, 1] x }) Orange$aveAge <- ave(Orange$age, by = Orange$Tree, FUN = scale) all.equal(Orange$ddplyAge, Orange$aveAge) Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens John.Morrongiello at csiro.au > Verzonden: maandag 3 oktober 2011 7:34 > Aan: r-help at r-project.org > Onderwerp: [R] new standardised variable based on group membership > > Hi > I have a data comprised of repeated measures of growth (5-15 records per > individual) for 580 fish (similar to Orange dataset from nlme library). I would like > to standardise these growth measures (yi ? ?/sd) using mean and standard > deviation unique to each fish. Can someone suggest a function that would help > me do this? I?ve had a look at scale and sweep but can?t find a worked example > that does what I?m after > > Cheers > > John > > > [[alternative HTML version deleted]] From landronimirc at gmail.com Mon Oct 3 09:47:25 2011 From: landronimirc at gmail.com (Liviu Andronic) Date: Mon, 3 Oct 2011 09:47:25 +0200 Subject: [R] Understanding the workflow between sweave, R and Latex In-Reply-To: <4E85B8E7.3040506@gmail.com> References: <1317384235559-3859612.post@n4.nabble.com> <4E85B8E7.3040506@gmail.com> Message-ID: On Fri, Sep 30, 2011 at 2:41 PM, Duncan Murdoch wrote: > As an aside, I don't recommend the workflow you describe: ?it's very slow > and cumbersome. ?It's much better to tell your text editor how to run both > Sweave and Latex in one command. ?In the upcoming release of R 2.14.0, this > Another approach is to use LyX. The latest stable release comes with an Sweave module that provides out-of-the box support for Sweave documents. Once everything is configured, and on a Mac it should be fairly straightforward in this case, then compiling documents is usually a matter of pressing a button or activating a key combination. LyX takes care of a lot of automation for you, including BibTeX et al. Regards Liviu From landronimirc at gmail.com Mon Oct 3 09:53:19 2011 From: landronimirc at gmail.com (Liviu Andronic) Date: Mon, 3 Oct 2011 09:53:19 +0200 Subject: [R] extracting p-values in scientific notation Message-ID: Dear all How does print.htest display the p-value in scientific notation? > (x <- cor.test(iris[[1]], iris[[3]])) Pearson's product-moment correlation data: iris[[1]] and iris[[3]] t = 21.65, df = 148, p-value < 2.2e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.8270 0.9055 sample estimates: cor 0.8718 Above the p-value comes as '< 2.2e-16', while inspecting the object I get a good old '0'. > x$p.value [1] 0 I tried to inspect print.htest but couldn't find it. I also played with format, round and the like to no avail. Any pointers? Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail From paul.hiemstra at knmi.nl Mon Oct 3 10:08:40 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Mon, 03 Oct 2011 08:08:40 +0000 Subject: [R] Gstat - Installation Fail _ download source and compile help ... In-Reply-To: References: Message-ID: <4E896D88.80105@knmi.nl> Hi Sandeep, You need to provide way more detail for us to be able to help you: - Which version of gstat do you want to install, ? - where did you get the sources, ? - is it the version inside R or the standalone version? - what version of R are you running? - *what is the exact error that gstat gave*? - have you tried the precompiled gstat standalone binaries? Furthermore, this is not the correct forum to get help for gstat, this is far more appropriate for the R-sig-geo mailing list. Another helpful mailing list is: http://www2.52north.org/mailman/listinfo/geostatistics good luck, Paul On 10/01/2011 05:03 PM, Sandeep Patil wrote: > Hello > > I have been trying to install gstat on university's unix based system ( i am > not familiar with many technical aspects of installation) but i am getting a > particular error which i could not find a solution to online. Here is what > the technical support guy mailed me back, i am sure someone who understands > the technicalities can explain me this procedure in a more lucid way. > * > **Technical Assistant's reply* > * > Unfortunately, the error is due to a type being used in one of the > source files which has not yet been defined in an include file. > The "u_int" type is defined in /usr/include/sys/types.h: > > typedef __u_int u_int; > > And, the "__u_int" type is defined in /usr/include/bits/types.h: > > typedef unsigned int __u_int; > > Note that is included at the top of , so > only the would need to be included. > > Without including , the program won't recognize > "u_int" as a valid type. So, this is an issue with the configuration > or perhaps source for the given program being compiled by the > package installation function of R. > > My suggestion would be to search for the given error message on any > support/help/discussion boards/websites related to the R program. > Or, do a google search to see if anyone else has encountered the same > error and find their suggested solution. > > Otherwise, you can manually download the source to your directory and > attempt to tweak the "configure" command, which would generate a more > correct Makefile. Or, in the least desirable scenario, insert the > needed "#include " in the given *.c file yourself and > compile. > * > Can anyone make out anything from this , i want to tweak the configure > command but do not know how to proceed. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From rolf.turner at xtra.co.nz Mon Oct 3 10:25:46 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Mon, 03 Oct 2011 21:25:46 +1300 Subject: [R] extracting p-values in scientific notation In-Reply-To: References: Message-ID: <4E89718A.9060804@xtra.co.nz> Isn't it true that 0 < 2.2e-16? cheers, Rolf Turner On 03/10/11 20:53, Liviu Andronic wrote: > Dear all > How does print.htest display the p-value in scientific notation? >> (x<- cor.test(iris[[1]], iris[[3]])) > Pearson's product-moment correlation > > data: iris[[1]] and iris[[3]] > t = 21.65, df = 148, p-value< 2.2e-16 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > 0.8270 0.9055 > sample estimates: > cor > 0.8718 > > Above the p-value comes as '< 2.2e-16', while inspecting the object I > get a good old '0'. >> x$p.value > [1] 0 > > I tried to inspect print.htest but couldn't find it. I also played > with format, round and the like to no avail. Any pointers? > > Regards > Liviu > > From lebatsnok at gmail.com Mon Oct 3 10:48:31 2011 From: lebatsnok at gmail.com (Kenn Konstabel) Date: Mon, 3 Oct 2011 11:48:31 +0300 Subject: [R] extracting p-values in scientific notation In-Reply-To: References: Message-ID: > is(x) [1] "htest" > # take a look at stats:::print.htest > format.pval(x$p.value) [1] "< 2.22e-16" Does that answer your question? KK On Mon, Oct 3, 2011 at 10:53 AM, Liviu Andronic wrote: > Dear all > How does print.htest display the p-value in scientific notation? >> (x <- cor.test(iris[[1]], iris[[3]])) > > ? ? ? ?Pearson's product-moment correlation > > data: ?iris[[1]] and iris[[3]] > t = 21.65, df = 148, p-value < 2.2e-16 > alternative hypothesis: true correlation is not equal to 0 > 95 percent confidence interval: > ?0.8270 0.9055 > sample estimates: > ? cor > 0.8718 > > Above the p-value comes as '< 2.2e-16', while inspecting the object I > get a good old '0'. >> x$p.value > [1] 0 > > I tried to inspect print.htest but couldn't find it. I also played > with format, round and the like to no avail. Any pointers? > > Regards > Liviu > > > -- > Do you know how to read? > http://www.alienetworks.com/srtest.cfm > http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader > Do you know how to write? > http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From paul.hiemstra at knmi.nl Mon Oct 3 10:54:11 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Mon, 03 Oct 2011 08:54:11 +0000 Subject: [R] patients.txt data In-Reply-To: <72BB5381-94DA-4722-8F25-4FFD08443248@gmail.com> References: <72BB5381-94DA-4722-8F25-4FFD08443248@gmail.com> Message-ID: <4E897833.30704@knmi.nl> On 10/02/2011 08:00 PM, Nadine Melhem wrote: > please send me the "patients.txt" data. > > thanks. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. I'm not really sure why you send this kind of e-mail to the R-help list. If you need a dataset, aks your instructor to provide it for you. And: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From ted.harding at wlandres.net Mon Oct 3 11:05:54 2011 From: ted.harding at wlandres.net ( (Ted Harding)) Date: Mon, 03 Oct 2011 10:05:54 +0100 (BST) Subject: [R] extracting p-values in scientific notation In-Reply-To: Message-ID: One point to note, for information, in this discussion is that cor.test() has apparently returned the P-value as an exact zero: x$p.value == 0 # [1] TRUE identical(x$p.value, 0) # [1] TRUE (which, by the way, I was led to after trying log10(x$p.value) and getting -Inf). Perhaps a more interesting question is how cor.test computes the P-value! Ted. On 03-Oct-11 08:48:31, Kenn Konstabel wrote: >> is(x) > [1] "htest" >> # take a look at stats:::print.htest >> format.pval(x$p.value) > [1] "< 2.22e-16" > > Does that answer your question? > > KK > > On Mon, Oct 3, 2011 at 10:53 AM, Liviu Andronic > wrote: >> Dear all >> How does print.htest display the p-value in scientific notation? >>> (x <- cor.test(iris[[1]], iris[[3]])) >> >> _ _ _ _Pearson's product-moment correlation >> >> data: _iris[[1]] and iris[[3]] >> t = 21.65, df = 148, p-value < 2.2e-16 >> alternative hypothesis: true correlation is not equal to 0 >> 95 percent confidence interval: >> _0.8270 0.9055 >> sample estimates: >> _ cor >> 0.8718 >> >> Above the p-value comes as '< 2.2e-16', while inspecting the object I >> get a good old '0'. >>> x$p.value >> [1] 0 >> >> I tried to inspect print.htest but couldn't find it. I also played >> with format, round and the like to no avail. Any pointers? >> >> Regards >> Liviu >> >> >> -- >> Do you know how to read? >> http://www.alienetworks.com/srtest.cfm >> http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader >> Do you know how to write? >> http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------- E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 03-Oct-11 Time: 10:05:50 ------------------------------ XFMail ------------------------------ From paul.hiemstra at knmi.nl Mon Oct 3 11:06:40 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Mon, 03 Oct 2011 09:06:40 +0000 Subject: [R] ggplot2 - extracting values of smooth In-Reply-To: References: Message-ID: <4E897B20.9070001@knmi.nl> On 09/30/2011 04:39 PM, dM/ wrote: > Suppose that I'm working on Hadley's diamond dataset and I want to > review the relationship between price, colour and carat. > > I might run the following: > > library(ggplot2) > > #plot scatter and add some hex binning > q<-qplot(carat,price,data=diamonds, geom=c("hex"), > main="Variability of Diamond Prices by Carat and Colour") > > #facet to get one scatter for each colour, plus overlay a black > coloured loess smoothed line showing the trends in the data > > q + > facet_wrap(~color,ncol=2)+geom_smooth(aes(group=1),colour=I("black")) > > Nice picture, but how do I extract the values of the smoothed line? > > Many thanks, dM/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hi, geom_smooth uses R functions to calculate the smooth line. Check out ?stat_smooth for more details. You can run these command outside ggplot to the values of the smoothed line. e.g.: library(ggplot2) # Make the plot ggplot(aes(x = speed, y = dist), data = cars) + geom_point() + stat_smooth(method = "loess") # Get the values smooth_vals = predict(loess(dist~speed,cars), cars$speed) Getting the values for other smoothing functions follows this same recipe. good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From paul.hiemstra at knmi.nl Mon Oct 3 11:07:56 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Mon, 03 Oct 2011 09:07:56 +0000 Subject: [R] ggplot2 - extracting values of smooth In-Reply-To: References: Message-ID: <4E897B6C.4000504@knmi.nl> On 09/30/2011 04:39 PM, dM/ wrote: > Suppose that I'm working on Hadley's diamond dataset and I want to > review the relationship between price, colour and carat. > > I might run the following: > > library(ggplot2) > > #plot scatter and add some hex binning > q<-qplot(carat,price,data=diamonds, geom=c("hex"), > main="Variability of Diamond Prices by Carat and Colour") > > #facet to get one scatter for each colour, plus overlay a black > coloured loess smoothed line showing the trends in the data > > q + > facet_wrap(~color,ncol=2)+geom_smooth(aes(group=1),colour=I("black")) > > Nice picture, but how do I extract the values of the smoothed line? > > Many thanks, dM/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ...and if you want to extract the smoothed lines per factor (as when using facet_wrap) use ddply. good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From marion.wenty at gmail.com Mon Oct 3 11:21:18 2011 From: marion.wenty at gmail.com (Marion Wenty) Date: Mon, 3 Oct 2011 11:21:18 +0200 Subject: [R] open source editor for r for beginners In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From landronimirc at gmail.com Mon Oct 3 11:24:05 2011 From: landronimirc at gmail.com (Liviu Andronic) Date: Mon, 3 Oct 2011 11:24:05 +0200 Subject: [R] extracting p-values in scientific notation In-Reply-To: <4E89718A.9060804@xtra.co.nz> References: <4E89718A.9060804@xtra.co.nz> Message-ID: Thanks all for your pointers. The following does trick: > base::format.pval(x$p.value) ##Hmisc also has such a function [1] "<2e-16" On Mon, Oct 3, 2011 at 10:25 AM, Rolf Turner wrote: > Isn't it true that 0 < 2.2e-16? > Yes, but it doesn't mean that the p-value actually hits absolute zero. And cor.test, as Ted noticed, returns > identical(x$p.value, 0) [1] TRUE Not that this makes a great practical difference in my case, but I would still prefer to print "<2e-16" in my Sweave document. Regards Liviu From petr.pikal at precheza.cz Mon Oct 3 11:25:43 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Mon, 3 Oct 2011 11:25:43 +0200 Subject: [R] Help with cast/reshape In-Reply-To: <1317444256.16600.YahooMailNeo@web121819.mail.ne1.yahoo.com> References: <1317444256.16600.YahooMailNeo@web121819.mail.ne1.yahoo.com> Message-ID: Hi > > I realize that this is terribly basic, but I just don't seem to see it at > this moment, so I would very much appreciate your help. > > > How shall I transform this dataframe: > > > df1 > Name Index Value > 1 a 1 0.1 > 2 a 2 0.2 > 3 a 3 0.3 > 4 a 4 0.4 > 5 b 1 2.1 > 6 b 2 2.2 > 7 b 3 2.3 > 8 b 4 2.4 > > > into this dataframe: > > > df2 > Index a b > 1 1 0.1 2.1 > 2 2 0.2 2.2 > 3 3 0.3 2.3 > 4 4 0.4 2.4 > I have not seen an answer so I believe you look for: cast(df1, Index~Name, value = "Value") Regards Petr > > df1 = data.frame(c("a", "a", "a", "a", "b", "b", "b", "b"), c(1,2,3,4,1,2, > 3,4), c(0.1, 0.2, 0.3, 0.4, 2.1, 2.2, 2.3, 2.4)) > colnames(df1) = c("Name", "Index", "Value") > > df2 = data.frame(c(1,2,3,4), c(0.1, 0.2, 0.3, 0.4), c(2.1, 2.2, 2.3, 2.4)) > colnames(df2) = c("Index", "a", "b") > > > Thank you very much. > > Dana > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From gavin.simpson at ucl.ac.uk Mon Oct 3 12:08:02 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Mon, 03 Oct 2011 11:08:02 +0100 Subject: [R] Interaction plot type=o In-Reply-To: <3F376AC9-A435-4C5D-B285-7C2153CCA00D@comcast.net> References: <5D09650D-13A2-4FD6-8C79-E28210A4083C@comcast.net> <3F376AC9-A435-4C5D-B285-7C2153CCA00D@comcast.net> Message-ID: <1317636482.1993.39.camel@desktop.localdomain> On Fri, 2011-09-30 at 12:33 -0400, David Winsemius wrote: > On Sep 30, 2011, at 2:16 AM, Petr PIKAL wrote: > > >> > >> David, > >> thank you for your reply > >> > >> I tried this > >> attach(mtcars) > >> interaction.plot(cyl, gear, mpg, type="o", pch=5:8, lty=1 ) > >> > >> but I got this error: > >> Error in match.arg(type) : 'arg' should be one of "l", "p", "b" > >> > >> and in ?interaction.plot, "o" it is not listed in type arguments. > >> Is there any other way to force it to take the argument? > > > > As David suggested you need to change code for interaction.plot. If > > you > > write > > > > interaction.plot > > > > you will get the code > > > >> interaction.plot > > function (x.factor, trace.factor, response, fun = mean, type = c("l", > > "p", "b"), legend = TRUE, trace.label = > > deparse(substitute(trace.factor)), > > fixed = FALSE, xlab = deparse(substitute(x.factor)), ylab = ylabel, > > ylim = range(cells, na.rm = TRUE), lty = nc:1, col = 1, pch = > > c(1L:9, > > 0, letters), xpd = NULL, leg.bg = par("bg"), leg.bty = "n", > > xtick = FALSE, xaxt = par("xaxt"), axes = TRUE, ...) > > {....... > > > > Copy it into some suitable text editor (not Word please) and change it > > according to your wish. Or use `fix()` or `fixInNamespace()` if this is just a one off. G > You can do it at the console on both the OS versions of R I have used. > just copy the code and paste. Add the "<-" and the ,"o" and hit enter. > Piece of cake. > > > I would start with adding "o" to type in function > > definition and see how it behaves. > > I tested it. Worked as expected. > > > Then you can copy the whole code to > > your modified function e.g. > > > > my.int.plot <- function(x,.... > > > > and call > > my.int.plot(cyl, gear, mpg, type="o", pch=5:8, lty=1 ) > > > > Regards > > Petr > > > > > >> > >> Thanks > >> Claudio > >> > >> On Thu, Sep 29, 2011 at 9:00 PM, David Winsemius > > wrote: > >> > >>> > >>> On Sep 29, 2011, at 7:22 PM, Heverkuhn Heverkuhn wrote: > >>> > >>> Hello, > >>>> I was wondering if there is any equivalent of interaction.plot > >>>> that allow > >>>> you to set type="o" > >>>> I tried to use interaction.plot and I have a gap between the > >>>> symbols of > >>>> the points and the line. > >>>> > >>>> > >>> If it's OK to have the lines going right though the symbols, then > >>> go ahead, > >>> hack the code. All you need to do is add ,"o" to the type > >>> arguments in the > >>> argument list. The code's not hidden or anything that gets in your > >>> way. > >>> > >>> > >>> David Winsemius, MD > >>> West Hartford, CT > >>> > >>> > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From erich.neuwirth at univie.ac.at Mon Oct 3 12:08:38 2011 From: erich.neuwirth at univie.ac.at (Erich Neuwirth) Date: Mon, 03 Oct 2011 12:08:38 +0200 Subject: [R] extracting p-values in scientific notation In-Reply-To: References: <4E89718A.9060804@xtra.co.nz> Message-ID: <4E8989A6.90006@univie.ac.at> format.pval is documented and accessible from outside of base. So you do not have to qualify it as base::format.pval On 10/3/2011 11:24 AM, Liviu Andronic wrote: > Thanks all for your pointers. The following does trick: >> base::format.pval(x$p.value) ##Hmisc also has such a function > [1] "<2e-16" > > > On Mon, Oct 3, 2011 at 10:25 AM, Rolf Turner wrote: >> Isn't it true that 0 < 2.2e-16? >> > > Yes, but it doesn't mean that the p-value actually hits absolute zero. > And cor.test, as Ted noticed, returns >> identical(x$p.value, 0) > [1] TRUE > > Not that this makes a great practical difference in my case, but I > would still prefer to print "<2e-16" in my Sweave document. > > Regards > Liviu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bbolker at gmail.com Mon Oct 3 15:03:39 2011 From: bbolker at gmail.com (Ben Bolker) Date: Mon, 3 Oct 2011 13:03:39 +0000 Subject: [R] New random number generator (RNG) References: <20293a43-2717-4e49-9b31-7f909fffa35b@i23g2000yqm.googlegroups.com> Message-ID: dataguru gmail.com> writes: > Based on very fast converging series for special transcendental > numbers. Is there some R code available? > > See details about the RNG at > http://www.analyticbridge.com/profiles/blogs/new-state-of-the-art- > random-number-generator-simple-strong-and-fa (Broken URL above to make Gmane happy.) This article seems to be about attempting to get better *cryptographic* number generators. This is not a very new idea ... http://mathoverflow.net/questions/26942/is-pi-a-good-random-number-generator It could be a fun project, but I doubt it's better for practical purposes than the well-tested existing RNG algorithms described in ?RNGkind ... Ben Bolker From bbolker at gmail.com Mon Oct 3 15:09:47 2011 From: bbolker at gmail.com (Ben Bolker) Date: Mon, 3 Oct 2011 13:09:47 +0000 Subject: [R] Unable to load local library via GUI References: <3502B28D-274D-4A72-A49E-27D91AEE1698@unimelb.edu.au> Message-ID: Noel Faux unimelb.edu.au> writes: > > Hi all, > > Not sure if this is the best list, please point me to a more appropriate list if necessary. > > Running Mac OSX 10.7.1 > R version 2.13.1 Patched (2011-08-14 r56741) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > However, when using the R64.app and try and load the [package] > within the GUI I am getting the following error: > Error in dyn.load(file, DLLpath = DLLpath, ...) : > unable to load shared object [snip] > > dlopen([snip]): Library not loaded: libRblas.dylib > Referenced from: /Library/Frameworks/R.framework/Versions/2.13/[...]/libmlalgs.so > Reason: image not found > Error in library(libmlalgs) : .First.lib failed for 'libmlalgs' > > I know the libmlalgs.so is in the right place and the permissions look OK: > -rwxr-xr-x 1 root admin 58384 3 Oct 14:30 libmlalgs.so > > So I am stumped as to why the GUI is unable to read the file. > Any pointers would very welcome. As I am a textmate user and it > passes the code to the R64.app, I currently stuck with either > command line or Rstudio (it's nice but is currently missing some key > features textmate has) [snip snip snip] I'm not sure of the answer, but there has been a lot of discussion of similar issues (showing up as problems loading libRblas.dylib) on the R-sig-mac [r-sig-mac at r-project.org] mailing list. I would search through the archives there and then ask on that list if you can't find a suitable answer. Ben Bolker From therneau at mayo.edu Mon Oct 3 16:05:39 2011 From: therneau at mayo.edu (Terry Therneau) Date: Mon, 03 Oct 2011 09:05:39 -0500 Subject: [R] survexp with large dataframes Message-ID: <1317650739.979.14.camel@nemo> I've re-looked at survexp with the question of efficiency. As it stands, the code will have 3-4 (I think it's 4) active copies of the X matrix at one point; this is likely the reason it takes so much memory when you have a large data set. Some of this is history; key parts of the code were written long before I understood all the "tricks" for smaller memory in S (Splus or R), 1 copy is the loss of the COPY= argument when going from Splus to R. I can see how to redo it and reduce to 1 copy, but this involves 3 R functions and 3 C routines. I'll add it to my list but don't expect quick results due to a long list in front of it. It's been a good summer, but as one of my colleagues put it "No vacation goes unpunished." As a mid term suggestion I would use a subsample of your data. With the data set sizes you describe a 20% subsample will give all the precision that you need. Specifically: 1. Save the results of your current Cox model, call it fit1 2. Select a subset. 3. Fit a new Cox model on the subset, with the options iter=0, init=fit1$coef This ensures that the subset has exactly the same coefficients as the original. 4. Use survexp on the subset fit. Terry Therneau From mentor_ at gmx.net Mon Oct 3 16:12:47 2011 From: mentor_ at gmx.net (syrvn) Date: Mon, 3 Oct 2011 07:12:47 -0700 (PDT) Subject: [R] How to run Bibtex with pdfLatex in StatEt/MikTex on Windows ? In-Reply-To: <4C28FAA5.7030207@paulhurley.co.uk> References: <4C28FAA5.7030207@paulhurley.co.uk> Message-ID: <1317651167642-3867625.post@n4.nabble.com> Hello, I have exactly the same problem that bibtex is not being called and so the bibliography is not being processed... Did you find any solution for that? Many thanks syrvn -- View this message in context: http://r.789695.n4.nabble.com/How-to-run-Bibtex-with-pdfLatex-in-StatEt-MikTex-on-Windows-tp2271396p3867625.html Sent from the R help mailing list archive at Nabble.com. From benzerfa at gmx.ch Mon Oct 3 12:20:48 2011 From: benzerfa at gmx.ch (Samir Benzerfa) Date: Mon, 3 Oct 2011 12:20:48 +0200 Subject: [R] Sorting data in R according to the header of another table Message-ID: <000001cc81b6$20344830$609cd890$@ch> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sina.rueeger at gmail.com Mon Oct 3 12:47:39 2011 From: sina.rueeger at gmail.com (sina.r) Date: Mon, 3 Oct 2011 03:47:39 -0700 (PDT) Subject: [R] How to format Rsuperscript 2 followed by = value In-Reply-To: <4E8933D4.9050707@monash.edu> References: <4E8933D4.9050707@monash.edu> Message-ID: <1317638859227-3867093.post@n4.nabble.com> Hi Nevil, the function bquote() should do what you want: (found here: http://r.789695.n4.nabble.com/expression-td904189.html) mtext(bquote(R^2==.(round(summary(mylm)$r.squared,3))),1) Regards, Sina R?eger -- View this message in context: http://r.789695.n4.nabble.com/How-to-format-Rsuperscript-2-followed-by-value-tp3866771p3867093.html Sent from the R help mailing list archive at Nabble.com. From ingaschwabe at gmail.com Mon Oct 3 14:11:05 2011 From: ingaschwabe at gmail.com (flokke) Date: Mon, 3 Oct 2011 05:11:05 -0700 (PDT) Subject: [R] Assigning factor names to interaction plot Message-ID: <1317643865438-3867311.post@n4.nabble.com> Hi everyone, I have the following problem: I have three variables, 'group', 'city' and 'pressure' There is an interaction effect between group and city and I'd like to show this in an interaction plot: interaction.plot(group, city, pressure, type="b", col= c(1:2), leg.bty="o", leg.bg="blue", lwd=1, pch=c(18,24,22), xlab="Group", ylab="Pressure", main="Interaction Plot") My problem is that I cant find a proper argument to pass factor names to the variables 'group' and 'city'. In the interaction plot now the groups are referred to as '1', '2' and '3' and the citys are referred to '1', 2'. Hoewever, Id like to pass string character names to those ('Therapy 1, 'therapy 2 and therapy 3). I'd also like to pass string character names to the varibale 'city' ('Amsterdam', "Rotterdam', etc.) I'm a quite new user (since two weeks), and normally I can easily find the solution to my problems on the internet, but however this time I'm frustrated because I cant find solutions that are helping me. I'd be very glad if you could give me a hint or could show me how to deal with this problem. Cheers, Maria -- View this message in context: http://r.789695.n4.nabble.com/Assigning-factor-names-to-interaction-plot-tp3867311p3867311.html Sent from the R help mailing list archive at Nabble.com. From martin.brandt at univie.ac.at Mon Oct 3 16:10:08 2011 From: martin.brandt at univie.ac.at (Martin B.) Date: Mon, 3 Oct 2011 07:10:08 -0700 (PDT) Subject: [R] stl-decomposition with missing season Message-ID: <1317651008473-3867618.post@n4.nabble.com> Dear all, I have a time series with a frequency of 10 days (so 36 yearly). one year is completely NA. Now I want to do a stl-decomposition, but using e.g. na.action= na.approx makes no sense for a whole year, of course. Is there a way of simulating this single year or to just make stl not using this year for the decomposition? -- View this message in context: http://r.789695.n4.nabble.com/stl-decomposition-with-missing-season-tp3867618p3867618.html Sent from the R help mailing list archive at Nabble.com. From murdoch.duncan at gmail.com Mon Oct 3 16:34:02 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 03 Oct 2011 10:34:02 -0400 Subject: [R] How to run Bibtex with pdfLatex in StatEt/MikTex on Windows ? In-Reply-To: <1317651167642-3867625.post@n4.nabble.com> References: <4C28FAA5.7030207@paulhurley.co.uk> <1317651167642-3867625.post@n4.nabble.com> Message-ID: <4E89C7DA.7070307@gmail.com> On 03/10/2011 10:12 AM, syrvn wrote: > Hello, > > I have exactly the same problem that bibtex is not being called and so the > bibliography is not being processed... > > Did you find any solution for that? That sounds like a StatET question. You should get bibtex if you run R CMD texi2dvi --pdf (or R CMD texi2pdf in the upcoming R 2.14). But you can always just run bibtex manually... Duncan Murdoch From bruno.piguet at gmail.com Mon Oct 3 16:35:33 2011 From: bruno.piguet at gmail.com (bruno Piguet) Date: Mon, 3 Oct 2011 16:35:33 +0200 Subject: [R] Best method to add unit information to dataframe ? Message-ID: Dear all, I'd like to have a dataframe store information about the units of the data it contains. You'll find below a minimal exemple of the way I do, so far. I add a "units" attribute to the dataframe. But I dont' like the long syntax needed to access to the unit of a given variable (namely, something like : var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, "names"))]] Can anybody point me to a better solution ? Thanks in advance, Bruno. # Dataframe creation x <- c(1:10) y <- c(11:20) z <- c(101:110) my_frame <- data.frame(x, y, z) attr(my_frame, "units") <- c("x_unit", "y_unit") # # later on, using dataframe for (var_name in c("x", "y")) { idx <- match(var_name, attr(my_frame, "names")) var_unit <- attr(my_frame, "units")[[idx]] print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) } From fernando.cabrera at nordea.com Mon Oct 3 16:49:41 2011 From: fernando.cabrera at nordea.com (fernando.cabrera at nordea.com) Date: Mon, 3 Oct 2011 17:49:41 +0300 Subject: [R] Matrix/Vector manipulation Message-ID: Hi guys, Have the following problem computing vectors with pure vector algebra and end up reverting to recursion or for-looping. Function my_cumsum calculates a weighted average (W) of ratios (R), but only up to the given size/volume (v). Now I recurse into the vector (from left to right) with what you have left from the difference of volume minus current weight, and stop when the difference is less than or equal to the current weight. Vectors W and R have the same length, and v is always a positive integer. W: {w_1 w_2 .. w_m} R: {r_1 r_2 .. r_m} my_cumsum <- function(v, R, W) { if (v <= W[1]) # check the head v*R[1] else W[1]*R[1] + my_cumsum(v - W[1], R[2:length(R)], W[2:length(W)]) # recurse the tail } Any help is greatly appreciated! Fernando Alvarez "Great ideas originate in the muscles." ~ Thomas A. Edison From erik.b.svensson at gmail.com Mon Oct 3 16:39:24 2011 From: erik.b.svensson at gmail.com (Erik Svensson) Date: Mon, 3 Oct 2011 07:39:24 -0700 (PDT) Subject: [R] Keep ALL duplicate records In-Reply-To: References: <1317564057860-3865136.post@n4.nabble.com> <1317579523767-3865573.post@n4.nabble.com> Message-ID: <1317652764537-3867709.post@n4.nabble.com> It worked, thank you Jim Erik -- View this message in context: http://r.789695.n4.nabble.com/Keep-ALL-duplicate-records-tp3865136p3867709.html Sent from the R help mailing list archive at Nabble.com. From erik.b.svensson at gmail.com Mon Oct 3 16:44:11 2011 From: erik.b.svensson at gmail.com (Erik Svensson) Date: Mon, 3 Oct 2011 07:44:11 -0700 (PDT) Subject: [R] Find all duplicate records In-Reply-To: References: <1317564311771-3865139.post@n4.nabble.com> Message-ID: <1317653051306-3867724.post@n4.nabble.com> It works, thanks a lot Gabor Erik -- View this message in context: http://r.789695.n4.nabble.com/Find-all-duplicate-records-tp3865139p3867724.html Sent from the R help mailing list archive at Nabble.com. From marc_schwartz at me.com Mon Oct 3 17:08:42 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Mon, 03 Oct 2011 10:08:42 -0500 Subject: [R] Best method to add unit information to dataframe ? In-Reply-To: References: Message-ID: <10262810-FF44-4B1B-904E-D50F22848E7D@me.com> On Oct 3, 2011, at 9:35 AM, bruno Piguet wrote: > Dear all, > > I'd like to have a dataframe store information about the units of > the data it contains. > > You'll find below a minimal exemple of the way I do, so far. I add a > "units" attribute to the dataframe. But I dont' like the long syntax > needed to access to the unit of a given variable (namely, something > like : > var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, > "names"))]] > > Can anybody point me to a better solution ? > > Thanks in advance, > > Bruno. > > > # Dataframe creation > x <- c(1:10) > y <- c(11:20) > z <- c(101:110) > my_frame <- data.frame(x, y, z) > attr(my_frame, "units") <- c("x_unit", "y_unit") > > # > # later on, using dataframe > for (var_name in c("x", "y")) { > idx <- match(var_name, attr(my_frame, "names")) > var_unit <- attr(my_frame, "units")[[idx]] > print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) > } The problem is that there are operations on data frames (e.g. subset()) that will end up stripping your attributes. > str(my_frame) 'data.frame': 10 obs. of 3 variables: $ x: int 1 2 3 4 5 6 7 8 9 10 $ y: int 11 12 13 14 15 16 17 18 19 20 $ z: int 101 102 103 104 105 106 107 108 109 110 - attr(*, "units")= chr "x_unit" "y_unit" newDF <- subset(my_frame, x <= 5) > str(newDF) 'data.frame': 5 obs. of 3 variables: $ x: int 1 2 3 4 5 $ y: int 11 12 13 14 15 $ z: int 101 102 103 104 105 You might want to look at either ?comment or the ?label function in Frank's Hmisc package on CRAN, either to use or for example code on how he handles this. HTH, Marc Schwartz From michael.weylandt at gmail.com Mon Oct 3 17:06:54 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Mon, 3 Oct 2011 11:06:54 -0400 Subject: [R] Sorting data in R according to the header of another table In-Reply-To: <000001cc81b6$20344830$609cd890$@ch> References: <000001cc81b6$20344830$609cd890$@ch> Message-ID: <49D55865-40A6-4B81-928C-2788575941B5@gmail.com> Try this: X = 1:5; names(X) = letters[sample(5)] Y = matrix(1:25, 5); colnames(Y) = letters[1:5] Y[ , names(X)] Hope this helps, Michael Weylandt On Oct 3, 2011, at 6:20 AM, "Samir Benzerfa" wrote: > Hi everyone, > > > > My (simplified) problem is the following one: I have two tables. The first > table contains 5 columns with 5 values and the second table contains a > single value for each vector name of the first table (see tables below): > > > > Table 1: > > > > A B C D E > > 1 2 3 4 5 > > 2 3 4 5 6 > > 3 4 5 6 7 > > 4 5 6 7 8 > > 5 6 7 8 9 > > > > > > Table 2: > > > > A B C D E > > 3 5 2 1 4 > > > > My goal is to first sort the values of Table 2 and then use the new header > of it to sort the columns of Table 1 according to that. I already sorted > Table 2 by using the sort(Table2) function getting the following result: > > > > D C A E B > > 1 2 3 4 5 > > > > How can I now sort the columns in Table 1 according to the header of the new > sorted Table 2. That is, have column D in the first position of Table 1, > column C in the second, and forth. > > > > Many thanks for your assistance! > > S.B. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bby2103 at columbia.edu Mon Oct 3 17:10:57 2011 From: bby2103 at columbia.edu (bby2103 at columbia.edu) Date: Mon, 03 Oct 2011 11:10:57 -0400 Subject: [R] difference between createPartition and createfold functions In-Reply-To: References: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> <20111002155420.wpg4myszy80s8csc@cubmail.cc.columbia.edu> Message-ID: <20111003111057.ylfp8eoe2ow8g0s8@cubmail.cc.columbia.edu> Hi Max, Thanks for the note. In your last paragraph, did you mean "in createDataPartition"? I'm a little vague about what returnTrain option does. Bonnie Quoting Max Kuhn : > Basically, createDataPartition is used when you need to make one or > more simple two-way splits of your data. For example, if you want to > make a training and test set and keep your classes balanced, this is > what you could use. It can also make multiple splits of this kind (or > leave-group-out CV aka Monte Carlos CV aka repeated training test > splits). > > createFolds is exclusively for k-fold CV. Their usage is simular when > you use the returnTrain = TRUE option in createFolds. > > Max > > On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou > wrote: >> Hi, >> >> On Sun, Oct 2, 2011 at 3:54 PM, ? wrote: >>> Hi Steve, >>> >>> Thanks for the note. I did try the example and the result didn't make sense >>> to me. For splitting a vector, what you describe is a big difference btw >>> them. For splitting a dataframe, I now wonder if these 2 functions are the >>> wrong choices. They seem to split the columns, at least in the few things I >>> tried. >> >> Sorry, I'm a bit confused now as to what you are after. >> >> You don't pass in a data.frame into any of the >> createFolds/DataPartition functions from the caret package. >> >> You pass in a *vector* of labels, and these functions tells you which >> indices into the vector to use as examples to hold out (or keep >> (depending on the value you pass in for the `returnTrain` argument)) >> between each fold/partition of your learning scenario (eg. cross >> validation with createFolds). >> >> You would then use these indices to keep (remove) the rows of a >> data.frame, if that is how you are storing your examples. >> >> Does that make sense? >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ?| Memorial Sloan-Kettering Cancer Center >> ?| Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > > Max > > From jwiley.psych at gmail.com Mon Oct 3 17:15:22 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 3 Oct 2011 08:15:22 -0700 Subject: [R] Best method to add unit information to dataframe ? In-Reply-To: References: Message-ID: Hi Bruno, It sounds like what you want is really a separate class, one that has stores information about units for each variable. This is far from an elegant example, but depending on your situation may be useful. I create a new class inheriting from the data frame class. This is likely fraught with problems because a formal S4 class is inheriting from an informal S3. Then a data frame can be stored in the .Data slot (special---I did not make it), but character data can also be stored in the units slot (which I did define). You could get fancier imposing constraints that the length of units be equal to the number of columns in the data frame or the like. S3 methods for data frames should still mostly work, but you also have the ability to access the new units slot. You could define special S4 methods to do the extraction then, if you wanted, so that your ultimate syntax to get the units of a particular variable would be shorter. setOldClass("data.frame") setClass("mydf", representation(units = "character"), contains = "data.frame", S3methods = TRUE) tmp <- new("mydf") tmp at .Data <- mtcars tmp at row.names <- rownames(mtcars) tmp at units <- c("x", "y") ## data frameish colMeans(tmp) tmp + 10 # but tmp at units Cheers, Josh N.B. I've read once and skimmeda gain Chambers' book, but I still do not have a solid grasp on S4 so I may have made some fundamental blunder in the example. On Mon, Oct 3, 2011 at 7:35 AM, bruno Piguet wrote: > Dear all, > > ?I'd like to have a dataframe store information about the units of > the data it contains. > > ?You'll find below a minimal exemple of the way I do, so far. I add a > "units" attribute to the dataframe. But ?I dont' like the long syntax > needed to access to the unit of a given variable (namely, something > like : > ? var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, > "names"))]] > > ?Can anybody point me to a better solution ? > > Thanks in advance, > > Bruno. > > > # Dataframe creation > x <- c(1:10) > y <- c(11:20) > z <- c(101:110) > my_frame <- data.frame(x, y, z) > attr(my_frame, "units") <- c("x_unit", "y_unit") > > # > # later on, using dataframe > for (var_name in c("x", "y")) { > ? idx <- match(var_name, attr(my_frame, "names")) > ? var_unit <- attr(my_frame, "units")[[idx]] > ? print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From mxkuhn at gmail.com Mon Oct 3 17:44:47 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Mon, 3 Oct 2011 11:44:47 -0400 Subject: [R] difference between createPartition and createfold functions In-Reply-To: <20111003111057.ylfp8eoe2ow8g0s8@cubmail.cc.columbia.edu> References: <20111002144756.no8xullnuokg4sc8@cubmail.cc.columbia.edu> <20111002155420.wpg4myszy80s8csc@cubmail.cc.columbia.edu> <20111003111057.ylfp8eoe2ow8g0s8@cubmail.cc.columbia.edu> Message-ID: No, it is an argument to createFolds. Type ?createFolds to see the appropriate syntax: "returnTrain a logical. When true, the values returned are the sample positions corresponding to the data used during training. This argument only works in conjunction with list = TRUE" On Mon, Oct 3, 2011 at 11:10 AM, wrote: > Hi Max, > > Thanks for the note. In your last paragraph, did you mean "in > createDataPartition"? I'm a little vague about what returnTrain option does. > > Bonnie > > Quoting Max Kuhn : > >> Basically, createDataPartition is used when you need to make one or >> more simple two-way splits of your data. For example, if you want to >> make a training and test set and keep your classes balanced, this is >> what you could use. It can also make multiple splits of this kind (or >> leave-group-out CV aka Monte Carlos CV aka repeated training test >> splits). >> >> createFolds is exclusively for k-fold CV. Their usage is simular when >> you use the returnTrain = TRUE option in createFolds. >> >> Max >> >> On Sun, Oct 2, 2011 at 4:00 PM, Steve Lianoglou >> wrote: >>> >>> Hi, >>> >>> On Sun, Oct 2, 2011 at 3:54 PM, ? wrote: >>>> >>>> Hi Steve, >>>> >>>> Thanks for the note. I did try the example and the result didn't make >>>> sense >>>> to me. For splitting a vector, what you describe is a big difference btw >>>> them. For splitting a dataframe, I now wonder if these 2 functions are >>>> the >>>> wrong choices. They seem to split the columns, at least in the few >>>> things I >>>> tried. >>> >>> Sorry, I'm a bit confused now as to what you are after. >>> >>> You don't pass in a data.frame into any of the >>> createFolds/DataPartition functions from the caret package. >>> >>> You pass in a *vector* of labels, and these functions tells you which >>> indices into the vector to use as examples to hold out (or keep >>> (depending on the value you pass in for the `returnTrain` argument)) >>> between each fold/partition of your learning scenario (eg. cross >>> validation with createFolds). >>> >>> You would then use these indices to keep (remove) the rows of a >>> data.frame, if that is how you are storing your examples. >>> >>> Does that make sense? >>> >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> ?| Memorial Sloan-Kettering Cancer Center >>> ?| Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> >> Max >> >> > > > -- Max From David.Reiner at xrtrading.com Mon Oct 3 17:56:58 2011 From: David.Reiner at xrtrading.com (David Reiner) Date: Mon, 3 Oct 2011 10:56:58 -0500 Subject: [R] Matrix/Vector manipulation Message-ID: <9DE405308A6AA24AA794B76282C6C00F0A069BF445@HQ-POST1> sum(ifelse(cumsum(W)<=v, W, 0) * R) HTH, David L. Reiner -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of fernando.cabrera at nordea.com Sent: Monday, October 03, 2011 9:50 AM To: r-help at r-project.org Subject: [SPAM] - [R] Matrix/Vector manipulation - Bayesian Filter detected spam Hi guys, Have the following problem computing vectors with pure vector algebra and end up reverting to recursion or for-looping. Function my_cumsum calculates a weighted average (W) of ratios (R), but only up to the given size/volume (v). Now I recurse into the vector (from left to right) with what you have left from the difference of volume minus current weight, and stop when the difference is less than or equal to the current weight. Vectors W and R have the same length, and v is always a positive integer. W: {w_1 w_2 .. w_m} R: {r_1 r_2 .. r_m} my_cumsum <- function(v, R, W) { if (v <= W[1]) # check the head v*R[1] else W[1]*R[1] + my_cumsum(v - W[1], R[2:length(R)], W[2:length(W)]) # recurse the tail } Any help is greatly appreciated! Fernando Alvarez "Great ideas originate in the muscles." ~ Thomas A. Edison ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. From joseph.g.boyer at gsk.com Mon Oct 3 18:13:57 2011 From: joseph.g.boyer at gsk.com (Joseph Boyer) Date: Mon, 3 Oct 2011 16:13:57 +0000 Subject: [R] how to get old packages to work on R 2.12.1 In-Reply-To: References: <536728FB13AAC2479BA9B9AC36A5C2AE865037ADB0@019D-NAMSG-02.019D.MGD.MSFT.NET> <7BFBD1073F134AF3AE697CC1BA59FC96@kcom.edu> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gunter.berton at gene.com Mon Oct 3 18:17:31 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Mon, 3 Oct 2011 09:17:31 -0700 Subject: [R] Best method to add unit information to dataframe ? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From David.Reiner at xrtrading.com Mon Oct 3 18:32:18 2011 From: David.Reiner at xrtrading.com (David Reiner) Date: Mon, 3 Oct 2011 11:32:18 -0500 Subject: [R] getting list of data.frame names In-Reply-To: References: Message-ID: <9DE405308A6AA24AA794B76282C6C00F0A069BF44A@HQ-POST1> I know you already got a good answer from Joshua Wiley. Here is a function I find useful. Anyone who wants to suggest improvements, please do so! > my.ls function(pos=1, sorted=FALSE, mode, class){ .result <- sapply(ls(pos=pos, all.names=TRUE), function(..x)object.size(eval(as.symbol(..x)))) if (sorted){ .result <- rev(sort(.result)) } .ls <- as.data.frame(rbind(as.matrix(.result),"**Total"=sum(.result))) names(.ls) <- "Size" .ls$Size <- formatC(.ls$Size, big.mark=',', digits=0, format='f') .ls$Mode <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)base::mode(eval(as.symbol(x))))), '-------') if (!missing(mode)) { .ls <- .ls[sapply(.ls$Mode, function(x) any(unlist(strsplit(x, ", ")) %in% mode)), , drop=FALSE] } .ls$Class <- c(sapply(rownames(.ls)[-nrow(.ls)], function(x) paste(unlist(base::class(eval(as.symbol(x)))), collapse=", ")), '-------') if (!missing(class)) { .ls <- .ls[sapply(.ls$Class, function(x) any(unlist(strsplit(x, ", ")) %in% class)), , drop=FALSE] } .ls } So: > my.ls(class='data.frame') Size Mode Class df 1,320 list data.frame DF 648 list data.frame DF2 712 list data.frame geu1 72,864 list data.frame result 1,384 list data.frame t 896 list data.frame HTH, -- David L. Reiner -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Erin Hodgess Sent: Saturday, October 01, 2011 10:38 PM To: R help Subject: [R] getting list of data.frame names Dear R People: This is probably a very simple question. I know that if I want to get a list of the classes of the objects in the workspace, I can do this: > sapply(ls(), function(x)class(get(x))) a a1.df b d "list" "data.frame" "integer" "numeric" Now I want to get just the data frames. > sapply(ls(), function(x)class(get(x))=="data.frame") a a1.df b d FALSE TRUE FALSE FALSE However, I would like the names of the data frames, rather than the True/False for the objects. I've been trying all sorts of combinations/permutations with no success. Any suggestions would be much appreciated. Thanks, Sincerely, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. From djmuser at gmail.com Mon Oct 3 18:34:50 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Mon, 3 Oct 2011 09:34:50 -0700 Subject: [R] Assigning factor names to interaction plot In-Reply-To: <1317643865438-3867311.post@n4.nabble.com> References: <1317643865438-3867311.post@n4.nabble.com> Message-ID: Hi: A small toy example: fakedata <- data.frame(group = factor(rep(1:3, each = 10), labels = paste('Therapy', 1:3)), city = factor(rep(c('Amsterdam', 'Rotterdam'), each = 5)), pressure = rnorm(30)) with(fakedata, interaction.plot(group, city, pressure, type="b", col= c(1:2), leg.bty="o", leg.bg="skyblue", lwd=1, pch=c(18,24,22), xlab="Group", ylab="Pressure", main="Interaction Plot") ) HTH, Dennis On Mon, Oct 3, 2011 at 5:11 AM, flokke wrote: > Hi everyone, > I have the following problem: > > I have three variables, 'group', 'city' and 'pressure' > > There is an interaction effect between group and city and I'd like to show > this in an interaction plot: > > interaction.plot(group, city, pressure, type="b", > ? ? ? ? ? ? ? ? col= c(1:2), > ? ? ? ? ? ? ? ? leg.bty="o", leg.bg="blue", lwd=1, pch=c(18,24,22), > ? ? ? ? ? ? ? ? xlab="Group", > ? ? ? ? ? ? ? ? ylab="Pressure", > ? ? ? ? ? ? ? ? main="Interaction Plot") > > My problem is that I cant find a proper argument to pass factor names to the > variables 'group' and 'city'. > > In the interaction plot now the groups are referred to as '1', '2' and '3' > and the citys are referred to '1', 2'. Hoewever, Id like to pass > string character names to those ('Therapy 1, 'therapy 2 and therapy 3). I'd > also like to pass > string character names to the varibale 'city' ('Amsterdam', "Rotterdam', > etc.) > > I'm a quite new user (since two weeks), and normally I can easily find the > solution to my problems on > the internet, but however this time I'm frustrated because I cant find > solutions that are helping me. > > I'd be very glad if you could give me a hint or could show me how to deal > with this problem. > > Cheers, > Maria > > -- > View this message in context: http://r.789695.n4.nabble.com/Assigning-factor-names-to-interaction-plot-tp3867311p3867311.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From mailinglist.honeypot at gmail.com Mon Oct 3 18:43:51 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Mon, 3 Oct 2011 12:43:51 -0400 Subject: [R] Best method to add unit information to dataframe ? In-Reply-To: References: Message-ID: Hi, If you want to take advantage of Josh's example below (using an S4 subclass of data.frame), perhaps you might be interested in taking advantage of the multitude of useful objects/classes defined in the bioconductor IRanges package: http://www.bioconductor.org/packages/release/bioc/html/IRanges.html It has no other bioconductor dependencies, so it's a "slim" install, in that respect. It defines a DataFrame class which keeps "metadata" around with as you subset/index/etc. it, eg: R> library(IRanges) R> DF <- DataFrame(a=1:10, b=letters[1:10]) R> metadata(DF) <- list(units=list(a=NA, b='inches')) R> sub.1 <- subset(DF, a %% 2 == 0) R> sub.1 DataFrame with 5 rows and 2 columns a b 1 2 b 2 4 d 3 6 f 4 8 h 5 10 j R> metadata(sub.1) $units $units$a [1] NA $units$b [1] "inches" (although I noticed that transform,DataFrame isn't defined actually ...) Anyway, HTH. -steve On Mon, Oct 3, 2011 at 11:15 AM, Joshua Wiley wrote: > Hi Bruno, > > It sounds like what you want is really a separate class, one that has > stores information about units for each variable. ?This is far from an > elegant example, but depending on your situation may be useful. ?I > create a new class inheriting from the data frame class. ?This is > likely fraught with problems because a formal S4 class is inheriting > from an informal S3. ?Then a data frame can be stored in the .Data > slot (special---I did not make it), but character data can also be > stored in the units slot (which I did define). ?You could get fancier > imposing constraints that the length of units be equal to the number > of columns in the data frame or the like. ?S3 methods for data frames > should still mostly work, but you also have the ability to access the > new units slot. ?You could define special S4 methods to do the > extraction then, if you wanted, so that your ultimate syntax to get > the units of a particular variable would be shorter. > > setOldClass("data.frame") > > setClass("mydf", representation(units = "character"), > ?contains = "data.frame", S3methods = TRUE) > > tmp <- new("mydf") > > tmp at .Data <- mtcars > tmp at row.names <- rownames(mtcars) > tmp at units <- c("x", "y") > > ## data frameish > colMeans(tmp) > tmp + 10 > > # but > tmp at units > > Cheers, > > Josh > > N.B. I've read once and skimmeda gain Chambers' book, but I still do > not have a solid grasp on S4 so I may have made some fundamental > blunder in the example. > > > > On Mon, Oct 3, 2011 at 7:35 AM, bruno Piguet wrote: >> Dear all, >> >> ?I'd like to have a dataframe store information about the units of >> the data it contains. >> >> ?You'll find below a minimal exemple of the way I do, so far. I add a >> "units" attribute to the dataframe. But ?I dont' like the long syntax >> needed to access to the unit of a given variable (namely, something >> like : >> ? var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, >> "names"))]] >> >> ?Can anybody point me to a better solution ? >> >> Thanks in advance, >> >> Bruno. >> >> >> # Dataframe creation >> x <- c(1:10) >> y <- c(11:20) >> z <- c(101:110) >> my_frame <- data.frame(x, y, z) >> attr(my_frame, "units") <- c("x_unit", "y_unit") >> >> # >> # later on, using dataframe >> for (var_name in c("x", "y")) { >> ? idx <- match(var_name, attr(my_frame, "names")) >> ? var_unit <- attr(my_frame, "units")[[idx]] >> ? print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) >> } >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From mentor_ at gmx.net Mon Oct 3 18:48:06 2011 From: mentor_ at gmx.net (syrvn) Date: Mon, 3 Oct 2011 09:48:06 -0700 (PDT) Subject: [R] How to run Bibtex with pdfLatex in StatEt/MikTex on Windows ? In-Reply-To: <4E89C7DA.7070307@gmail.com> References: <4C28FAA5.7030207@paulhurley.co.uk> <1317651167642-3867625.post@n4.nabble.com> <4E89C7DA.7070307@gmail.com> Message-ID: <1317660486076-3868063.post@n4.nabble.com> Hi Duncan, you were right. texi2dvi does latex + bibtex. Unfortunately I cannot get it running. When I run texi2dvi(file = "path/to/tex/file", pdf=TRUE, quiet=FALSE) then I get the following error message: ___________________________________________________________________________ Error in texi2dvi(file = "/Users/XXX/Desktop/test/body.tex", texinputs = "/Library/Frameworks/R.framework/Resources/share/texmf/tex/latex/tex", : Running 'texi2dvi' on '/Users/XXX/Desktop/test/body.tex' failed. Output: You don't have a working TeX binary (tex) installed anywhere in your PATH, and texi2dvi cannot proceed without one. If you want to use this script, you'll need to install TeX (if you don't have it) or change your PATH or TEX environment variable (if you do). See the --help output for more details. For information about obtaining TeX, please see http://www.tug.org. If you happen to be using Debian, you can get it with this command: apt-get install tetex-bin __________________________________________________________________________ So the output tells us it cannot find a a tex binary file. If I open my terminal and type in: /Library/Frameworks/R.framework/Resources/share/texmf/tex/latex/tex I get the following output: This is pdfTeX, Version 3.1415926-2.3-1.40.12 (TeX Live 2011) ** which means the tex binary is there. -- View this message in context: http://r.789695.n4.nabble.com/How-to-run-Bibtex-with-pdfLatex-in-StatEt-MikTex-on-Windows-tp2271396p3868063.html Sent from the R help mailing list archive at Nabble.com. From ggrothendieck at gmail.com Mon Oct 3 18:56:48 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Mon, 3 Oct 2011 12:56:48 -0400 Subject: [R] Best method to add unit information to dataframe ? In-Reply-To: References: Message-ID: On Mon, Oct 3, 2011 at 10:35 AM, bruno Piguet wrote: > Dear all, > > ?I'd like to have a dataframe store information about the units of > the data it contains. > > ?You'll find below a minimal exemple of the way I do, so far. I add a > "units" attribute to the dataframe. But ?I dont' like the long syntax > needed to access to the unit of a given variable (namely, something > like : > ? var_unit <- attr(my_frame, "units")[[match(var_name, attr(my_frame, > "names"))]] > > ?Can anybody point me to a better solution ? > > Thanks in advance, > > Bruno. > > > # Dataframe creation > x <- c(1:10) > y <- c(11:20) > z <- c(101:110) > my_frame <- data.frame(x, y, z) > attr(my_frame, "units") <- c("x_unit", "y_unit") > > # > # later on, using dataframe > for (var_name in c("x", "y")) { > ? idx <- match(var_name, attr(my_frame, "names")) > ? var_unit <- attr(my_frame, "units")[[idx]] > ? print (paste("max ", var_name, ": ", max(my_frame[[var_name]]), var_unit)) > } The Hmisc package has some support for this: library(Hmisc) DF <- data.frame(x, y, z) units(DF$x) <- "my x units" units(DF$y) <- "my y units" units(DF$x) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From batholdy at googlemail.com Mon Oct 3 19:05:25 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Mon, 3 Oct 2011 19:05:25 +0200 Subject: [R] intensity map / plot Message-ID: <1D088B40-E7EB-4B9E-A19F-97EC47732DEA@googlemail.com> Dear R-list, I would like to generate an intensity map based on a x * y matrix. Each point in the matrix should get plotted at the coordinate: x = column / y = row with a color-intensity (for example gray-value) based on the actual value of this point. Is there a convenient package / function for this kind of plot? example data: x <- matrix(rnorm(100*100, 0, 2), 100, 100) thanks! From mentor_ at gmx.net Mon Oct 3 19:06:49 2011 From: mentor_ at gmx.net (syrvn) Date: Mon, 3 Oct 2011 10:06:49 -0700 (PDT) Subject: [R] How to run Bibtex with pdfLatex in StatEt/MikTex on Windows ? In-Reply-To: <1317660486076-3868063.post@n4.nabble.com> References: <4C28FAA5.7030207@paulhurley.co.uk> <1317651167642-3867625.post@n4.nabble.com> <4E89C7DA.7070307@gmail.com> <1317660486076-3868063.post@n4.nabble.com> Message-ID: <1317661609013-3868123.post@n4.nabble.com> I am a bit confused... I wrote a little shell script called tex2pdf which just calls: texi2pdf body.tex and when I execute it in a terminal it runs smoothly. If I type into R: system("/Path/to/tex2pdf") I get the same error message: You don't have a working TeX binary (tex) installed anywhere in your PATH, and texi2dvi cannot proceed without one. If you want to use this script, you'll need to install TeX (if you don't have it) or change your PATH or TEX environment variable (if you do). See the --help output for more details. For information about obtaining TeX, please see http://www.tug.org. If you happen to be using Debian, you can get it with this command: apt-get install tetex-bin How can this happen? I originally thought it's a problem with the texi2dvi function in R but it seems to be something else... -- View this message in context: http://r.789695.n4.nabble.com/How-to-run-Bibtex-with-pdfLatex-in-StatEt-MikTex-on-Windows-tp2271396p3868123.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Mon Oct 3 19:11:57 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 03 Oct 2011 19:11:57 +0200 Subject: [R] intensity map / plot In-Reply-To: <1D088B40-E7EB-4B9E-A19F-97EC47732DEA@googlemail.com> References: <1D088B40-E7EB-4B9E-A19F-97EC47732DEA@googlemail.com> Message-ID: <4E89ECDD.8070809@statistik.tu-dortmund.de> See ?image Uwe Ligges On 03.10.2011 19:05, Martin Batholdy wrote: > Dear R-list, > > > I would like to generate an intensity map based on a x * y matrix. > > Each point in the matrix should get plotted at the coordinate: x = column / y = row with > a color-intensity (for example gray-value) based on the actual value of this point. > > Is there a convenient package / function for this kind of plot? > > > > example data: > x<- matrix(rnorm(100*100, 0, 2), 100, 100) > > > thanks! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon Oct 3 19:13:39 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 03 Oct 2011 19:13:39 +0200 Subject: [R] about the array transpose In-Reply-To: <1317605351283-3866241.post@n4.nabble.com> References: <1317605351283-3866241.post@n4.nabble.com> Message-ID: <4E89ED43.50301@statistik.tu-dortmund.de> On 03.10.2011 03:29, venerealdisease wrote: > Hi, all, > > I am a newbie for [R] > Would anyone help me how to transpose a 3x3x3 array for 1:27 > > Eg. > A<-array(1:27, c(3,3,3) > > What is the logic to transpose it to B<-aperm(A, c(3,2,1)) It simply says third dimension first, second second, and first third. Uwe Ligges > > Because I found I could not imagine how it transposes, anyone could solve my > problem? > And most important I could get the number what I expected, I think if I > could not figure it out, I will have a confused concept which will affect my > future learning of 3D models in [R]. > > Highly appreciated and thanks. > > VD > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/about-the-array-transpose-tp3866241p3866241.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Mon Oct 3 19:16:47 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Mon, 3 Oct 2011 13:16:47 -0400 Subject: [R] rolling regression In-Reply-To: References: Message-ID: It seems you don't really know how predict works. If you don't supply new data, it will only return the least squares fit to the old data, which is the large data block you saw. Check the first example given in ?predict to see how this works for new (out of sample) data. More importantly, use of lm() gives a model for contemporaneous fitting of your data to cash_ret. You probably need to use a time series model that has forecasting built into it (unless you can somehow your independent variables before your dependent variables) Michael Weylandt On Sun, Oct 2, 2011 at 11:41 PM, Darius H wrote: > > Dear all, > > I have spent the last few days on a seemingly simple and previously documented rolling regression. > > I have a 60 year data set organized in a ts matrix. > The matrix has 5 columns; cash_ret, epy1, ism1, spread1, unemp1 > > I have been able to come up with the following based on previous help threads. It seems to work fine. > The trouble is I get regression coefficients but need the immediate next period forecast. > > cash_fit= rollapply(cash_data, width=60, > > function(x) coef(lm(cash_ret~epy1+ism1+spread1+unemp1, data = as.data.frame(x))), > > by.column=FALSE, align="right"); cash_fit > > > I tried to replace "coef" above to "predict" but I get a whole bunch of results too big to be displayed. I would be grateful > if someone could guide me on how to get the next period forecast after each regression. > > If there is a possibility of getting the significance of each regressor and the standard error in addition to R-sq > without having to spend the next week, that would be helpful as well. > > Many thanks, > Darius > > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From michael.weylandt at gmail.com Mon Oct 3 19:21:00 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Mon, 3 Oct 2011 13:21:00 -0400 Subject: [R] about the array transpose In-Reply-To: <4E89ED43.50301@statistik.tu-dortmund.de> References: <1317605351283-3866241.post@n4.nabble.com> <4E89ED43.50301@statistik.tu-dortmund.de> Message-ID: Mr. Disease, As Uwe points out, the syntax is pretty clear, but it is perhaps worth mulling over why: R> identical(A[,,1], t(B[1,,])) TRUE to confirm that you understand the function. Michael Weylandt PS -- Might I suggest, if you insist on anonymity, a different handle? I'm answering on my lunch break and I'm finding that you are providing me with all sorts of icky mental images... 2011/10/3 Uwe Ligges : > > > On 03.10.2011 03:29, venerealdisease wrote: >> >> Hi, all, >> >> I am a newbie for [R] >> Would anyone help me how to transpose a 3x3x3 array for 1:27 >> >> Eg. >> A<-array(1:27, c(3,3,3) >> >> What is the logic to transpose it to B<-aperm(A, c(3,2,1)) > > It simply says third dimension first, second second, and first third. > > Uwe Ligges > > >> >> Because I found I could not imagine how it transposes, anyone could solve >> my >> problem? >> And most important I could get the number what I expected, I think if I >> could not figure it out, I will have a confused concept which will affect >> my >> future learning of 3D models in [R]. >> >> Highly appreciated and thanks. >> >> VD >> >> >> >> >> >> >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/about-the-array-transpose-tp3866241p3866241.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Thomas.Adams at noaa.gov Mon Oct 3 19:24:51 2011 From: Thomas.Adams at noaa.gov (Thomas Adams) Date: Mon, 03 Oct 2011 13:24:51 -0400 Subject: [R] Question about ggplot2 and stat_smooth Message-ID: <4E89EFE3.5050804@noaa.gov> I'm interested in creating a graphic -like- this: c <- ggplot(mtcars, aes(qsec, wt)) c + geom_point() + stat_smooth(fill="blue", colour="darkblue", size=2, alpha = 0.2) but I need to show 2 sets of bands (with different shading) using 5%, 25%, 75%, 95% limits that I specify and where the heavy blue line is the median. I don't understand how to do this with ggplot2. What I am doing currently is to generate 'boxplots' (with 5%, 25%, 75%, 95% limits) at 6-hourly time steps (so I have a series of boxplots, which you can see by clicking on a map point: http://www.erh.noaa.gov/mmefs/index_test.php?Lat=38.2&Lon=-80.1&Zoom=5&Refresh=0&RFCOverlay=0&Model=NAEFS). Some who use our graphics would like to see something more like the ggplot2 with stat_smooth graphic. Help is much appreciated. Regards, Tom -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.adams at noaa.gov VOICE: 937-383-0528 FAX: 937-383-0033 From pdalgd at gmail.com Mon Oct 3 19:27:31 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Mon, 3 Oct 2011 19:27:31 +0200 Subject: [R] generating Venn diagram with 6 sets In-Reply-To: References: Message-ID: <695BFFF8-7143-49FE-8A88-8DCA1B2C34B9@gmail.com> On Oct 2, 2011, at 18:25 , Mao Jianfeng wrote: > But, vennerable can not be installed on my Mac book. Works for me. What are the symptoms? -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From mentor_ at gmx.net Mon Oct 3 19:33:04 2011 From: mentor_ at gmx.net (syrvn) Date: Mon, 3 Oct 2011 10:33:04 -0700 (PDT) Subject: [R] How to run Bibtex with pdfLatex in StatEt/MikTex on Windows ? In-Reply-To: <1317661609013-3868123.post@n4.nabble.com> References: <4C28FAA5.7030207@paulhurley.co.uk> <1317651167642-3867625.post@n4.nabble.com> <4E89C7DA.7070307@gmail.com> <1317660486076-3868063.post@n4.nabble.com> <1317661609013-3868123.post@n4.nabble.com> Message-ID: <1317663184975-3868226.post@n4.nabble.com> Hi, I know now why it did not work. See here: http://www.mail-archive.com/r-help at r-project.org/msg19682.html I started R via the terminal with open -a R and then called the script again as well as the texi2dvi function. Both worked. The problem now is I use R within Eclipse and I don't know how to start R in that way so that the paths are set correct. -- View this message in context: http://r.789695.n4.nabble.com/How-to-run-Bibtex-with-pdfLatex-in-StatEt-MikTex-on-Windows-tp2271396p3868226.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Mon Oct 3 20:16:16 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Mon, 3 Oct 2011 11:16:16 -0700 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: <4E89EFE3.5050804@noaa.gov> References: <4E89EFE3.5050804@noaa.gov> Message-ID: Hi: I would think that, at least in principle, this should work: a <- ggplot(mtcars, aes(qsec, wt)) a + geom_point() + stat_smooth(fill="blue", colour="darkblue", size=2, level = 0.9, alpha = 0.2) + stat_smooth(fill = 'blue', colour = 'darkblue', size = 2, level = 0.5, alpha = 0.4) but geom_smooth() doesn't seem to react to different values of level. I get the same bands on geom_smooth() if I use level = 0.95, 0.75 or 0.5. Since I make it a policy not to invoke the B word, I'll let others who know more about the code comment on it. BTW, ggplot2 related questions tend to get faster responses on its list: ggplot2 at googlegroups.com. I've taken the liberty of forwarding there since most of the developers read that list more regularly than R-help. Dennis On Mon, Oct 3, 2011 at 10:24 AM, Thomas Adams wrote: > ?I'm interested in creating a graphic -like- this: > > c <- ggplot(mtcars, aes(qsec, wt)) > c + geom_point() + stat_smooth(fill="blue", colour="darkblue", size=2, alpha > = 0.2) > > but I need to show 2 sets of bands (with different shading) using 5%, 25%, > 75%, 95% limits that I specify and where the heavy blue line is the median. > I don't understand how to do this with ggplot2. What I am doing currently is > to generate 'boxplots' (with 5%, 25%, 75%, 95% limits) at 6-hourly time > steps (so I have a series of boxplots, which you can see by clicking on a > map point: > http://www.erh.noaa.gov/mmefs/index_test.php?Lat=38.2&Lon=-80.1&Zoom=5&Refresh=0&RFCOverlay=0&Model=NAEFS). > Some who use our graphics would like to see something more like the ggplot2 > with stat_smooth graphic. > > Help is much appreciated. > > Regards, > Tom > > -- > Thomas E Adams > National Weather Service > Ohio River Forecast Center > 1901 South State Route 134 > Wilmington, OH 45177 > > EMAIL: ?thomas.adams at noaa.gov > > VOICE: ?937-383-0528 > FAX: ? ?937-383-0033 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From murdoch.duncan at gmail.com Mon Oct 3 20:22:35 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 03 Oct 2011 14:22:35 -0400 Subject: [R] How to run Bibtex with pdfLatex in StatEt/MikTex on Windows ? In-Reply-To: <1317660486076-3868063.post@n4.nabble.com> References: <4C28FAA5.7030207@paulhurley.co.uk> <1317651167642-3867625.post@n4.nabble.com> <4E89C7DA.7070307@gmail.com> <1317660486076-3868063.post@n4.nabble.com> Message-ID: <4E89FD6B.7020602@gmail.com> Your subject line says Windows, but your error message suggests MacOS. I think you need to post questions that are answerable if you want an answer. Duncan Murdoch On 03/10/2011 12:48 PM, syrvn wrote: > Hi Duncan, > > you were right. texi2dvi does latex + bibtex. Unfortunately I cannot get it > running. > > When I run texi2dvi(file = "path/to/tex/file", pdf=TRUE, quiet=FALSE) then I > get the following error message: > > ___________________________________________________________________________ > Error in texi2dvi(file = "/Users/XXX/Desktop/test/body.tex", texinputs = > "/Library/Frameworks/R.framework/Resources/share/texmf/tex/latex/tex", : > Running 'texi2dvi' on '/Users/XXX/Desktop/test/body.tex' failed. > Output: > You don't have a working TeX binary (tex) installed anywhere in > your PATH, and texi2dvi cannot proceed without one. If you want to use > this script, you'll need to install TeX (if you don't have it) or change > your PATH or TEX environment variable (if you do). See the --help > output for more details. > > For information about obtaining TeX, please see http://www.tug.org. If > you happen to be using Debian, you can get it with this command: > apt-get install tetex-bin > __________________________________________________________________________ > > > So the output tells us it cannot find a a tex binary file. If I open my > terminal and type in: > > /Library/Frameworks/R.framework/Resources/share/texmf/tex/latex/tex > > I get the following output: > > This is pdfTeX, Version 3.1415926-2.3-1.40.12 (TeX Live 2011) > ** > > which means the tex binary is there. > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-run-Bibtex-with-pdfLatex-in-StatEt-MikTex-on-Windows-tp2271396p3868063.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From benaragamad at yahoo.com Mon Oct 3 18:55:42 2011 From: benaragamad at yahoo.com (dilshan benaragama) Date: Mon, 3 Oct 2011 09:55:42 -0700 (PDT) Subject: [R] distance coefficient for amatrix with ngative valus Message-ID: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jcaplan at AESOP.Rutgers.edu Mon Oct 3 17:09:51 2011 From: jcaplan at AESOP.Rutgers.edu (Josh Caplan) Date: Mon, 03 Oct 2011 11:09:51 -0400 Subject: [R] Compact letter display for interaction effects Message-ID: <4E89D03F.30703@aesop.rutgers.edu> Hello, I am interested in generating a compact letter display for the results of Tukey HSD tests that contain interaction effects. The 'cld' method in the 'multcomp' package seems only to work for main effects. Does such a thing exist already? Thank you for any thoughts, Josh -- Joshua Caplan, PhD Postdoctoral Associate Department of Ecology, Evolution, and Natural Resources Rutgers University 14 College Farm Rd, New Brunswick, NJ 08901 p 732-932-9383 f 732-932-8746 jcaplan at aesop.rutgers.edu From francy.casalino at gmail.com Mon Oct 3 17:14:27 2011 From: francy.casalino at gmail.com (francy) Date: Mon, 3 Oct 2011 08:14:27 -0700 (PDT) Subject: [R] Import in R with White Spaces Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ghubona at gmail.com Mon Oct 3 17:21:46 2011 From: ghubona at gmail.com (Geoffrey Hubona) Date: Mon, 3 Oct 2011 11:21:46 -0400 Subject: [R] Online Course PLS and R and free, public videos Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Gareth.Liu-Evans at liverpool.ac.uk Mon Oct 3 17:29:02 2011 From: Gareth.Liu-Evans at liverpool.ac.uk (Liu Evans, Gareth) Date: Mon, 3 Oct 2011 15:29:02 +0000 Subject: [R] minimisation problem, two setups (nonlinear with equality constraints/linear programming with mixed constraints) Message-ID: Dear All, Thank you for the replies to my first thread here: http://r.789695.n4.nabble.com/global-optimisation-with-inequality-constraints-td3799258.html. So far the best result is achieved via a penalised objective function. This was suggested by someone on this list privately. I am still looking into some of the options mentioned in the original thread, but I have been advised that there may be better ways if I present the actual problem with a reproducible example. In principle the problem can be solved by linear programming, so I include code for my attempt at this via RGLPK. It says that there is no feasible solution, but the solution is known analytically in the case below. Here is the precise problem: Minimise, over 100?1 real vectors v, Max_i(|v_i|) such that X'v=e_2, where X is a given 100?2 matrix and e_2 =(0,1)'. The v_i are the elements of v. I have put the actual X matrix at the end of this post, along with a feasible starting value for v. The correct minimum is 0.01287957, obtained with v_i=0.01287957 for i<=50 and v_i = 0.01287957 for i>=51. Here is the code for the penalised objective function approach, adapted very slightly from what someone on this list sent me: ...................................................................... X <- # See end of this message for the X data x1 <- X[, 1] x2 <- X[, 2] fun <- function(q) { mu <- 0.1 max(abs(v)) + (sum(v*x1)^2 + (1-sum(x2*v))^2)/(2*mu) } vstart <- # feasible starting value. See end of this post. sol <- optim(vstart, fun, method="L-BFGS-B", lower=rep(-1, 100), upper=rep(1,100)) max(abs(sol$par)) ......................................................................... This gets quite near, around 0.015-0.016 for me by varying mu. Alternatively the problem can be set up as a linear programming task as follows: Minimise, over 100?1 real vectors v, and over scalars q >= 0, q such that, for i=1,...,100 v_i<=q v_i>=-q X'v=e_2 Here is my RGLPK code: ......................................................................... X <- # See end of this message for the X data XROWS <- 100 XCOLS <- 2 e_2=rep(0,times=XCOLS) e2[2]<- 1 obj <- c(rep(0,XROWS),1) # coefficients on v_1, . . . , v_100, q. mat <- matrix(rbind(cbind(diag(XROWS), rep(-1,XROWS)), cbind(diag(XROWS), rep(1,XROWS)), cbind(t(X), rep(0,XCOLS)), cbind(t(rep(0,XROWS)), 1)), nrow=2*XROWS+XCOLS+1) dir <- c(rep("<=", XROWS), rep(">=", XROWS), rep("==", XCOLS), ">=") rhs <- c(rep(0, 2*XROWS), e_2, 0) sol <- Rglpk_solve_LP(obj, mat, dir, rhs, types = NULL, max = FALSE, bounds = c(-5,5), verbose = TRUE) ........................................................................... The output is " GLPK Simplex Optimizer, v4.42 203 rows, 101 columns, 601 non-zeros 0: obj = 0.000000000e+000 infeas = 1.000e+000 (2) 4: obj = 0.000000000e+000 infeas = 1.000e+000 (1) PROBLEM HAS NO FEASIBLE SOLUTION " I have also tried setting the problem up with a small interval around the equality constraints rather than having strict equalities, but could not get the correct solution this way either. Maybe I am making an error with RGLPK - I have been told it should work for a problem of this size and much larger. I have also tried DEoptim, IpSolve and ConstrOptim. Regards, Gareth ............................................................................... Values of X and vstart below ............................................................................... vstart= -0.025251183 -0.022301089 -0.020429759 -0.01902228 -0.017877586 -0.016903415 -0.016049376 -0.015284788 -0.014589517 -0.0139496 -0.01335494 -0.012797983 -0.012272922 -0.011775194 -0.011301139 -0.010847778 -0.010412649 -0.009993692 -0.009589163 -0.009197573 -0.00881764 -0.008448248 -0.008088421 -0.007737298 -0.007394116 -0.007058194 -0.006728922 -0.006405748 -0.006088173 -0.005775744 -0.005468043 -0.005164689 -0.00486533 -0.004569638 -0.00427731 -0.003988063 -0.003701629 -0.003417759 -0.003136215 -0.002856772 -0.002579217 -0.002303343 -0.002028954 -0.00175586 -0.001483876 -0.001212825 -0.00094253 -0.00067282 -0.000403526 -0.000134481 0.000134481 0.000403526 0.00067282 0.00094253 0.001212825 0.001483876 0.00175586 0.002028954 0.002303343 0.002579217 0.002856772 0.003136215 0.003417759 0.003701629 0.003988063 0.00427731 0.004569638 0.00486533 0.005164689 0.005468043 0.005775744 0.006088173 0.006405748 0.006728922 0.007058194 0.007394116 0.007737298 0.008088421 0.008448248 0.00881764 0.009197573 0.009589163 0.009993692 0.010412649 0.010847778 0.011301139 0.011775194 0.012272922 0.012797983 0.01335494 0.0139496 0.014589517 0.015284788 0.016049376 0.016903415 0.017877586 0.01902228 0.020429759 0.022301089 0.025251183 ..................................................................................................................................... X= 1 -2.330078923 1 -2.057855981 1 -1.885177032 1 -1.755300501 1 -1.649672679 1 -1.559779992 1 -1.480972651 1 -1.410419531 1 -1.346262665 1 -1.287213733 1 -1.232340861 1 -1.180947041 1 -1.13249653 1 -1.086568115 1 -1.042824239 1 -1.000989917 1 -0.960837931 1 -0.922178178 1 -0.884849841 1 -0.848715527 1 -0.813656808 1 -0.779570774 1 -0.746367337 1 -0.713967098 1 -0.682299633 1 -0.651302112 1 -0.62091817 1 -0.591096977 1 -0.561792466 1 -0.532962693 1 -0.504569287 1 -0.476576998 1 -0.448953298 1 -0.421668052 1 -0.39469322 1 -0.368002611 1 -0.341571661 1 -0.315377237 1 -0.289397474 1 -0.263611615 1 -0.237999879 1 -0.212543342 1 -0.187223821 1 -0.162023779 1 -0.136926226 1 -0.111914639 1 -0.08697288 1 -0.062085116 1 -0.037235755 1 -0.012409369 1 0.012409369 1 0.037235755 1 0.062085116 1 0.08697288 1 0.111914639 1 0.136926226 1 0.162023779 1 0.187223821 1 0.212543342 1 0.237999879 1 0.263611615 1 0.289397474 1 0.315377237 1 0.341571661 1 0.368002611 1 0.39469322 1 0.421668052 1 0.448953298 1 0.476576998 1 0.504569287 1 0.532962693 1 0.561792466 1 0.591096977 1 0.62091817 1 0.651302112 1 0.682299633 1 0.713967098 1 0.746367337 1 0.779570774 1 0.813656808 1 0.848715527 1 0.884849841 1 0.922178178 1 0.960837931 1 1.000989917 1 1.042824239 1 1.086568115 1 1.13249653 1 1.180947041 1 1.232340861 1 1.287213733 1 1.346262665 1 1.410419531 1 1.480972651 1 1.559779992 1 1.649672679 1 1.755300501 1 1.885177032 1 2.057855981 1 2.330078923 From bjw78 at well.ox.ac.uk Mon Oct 3 17:40:19 2011 From: bjw78 at well.ox.ac.uk (Benjamin Wright) Date: Mon, 3 Oct 2011 15:40:19 +0000 Subject: [R] Parsing variable-length delimited strings into a matrix Message-ID: <0D642FB35639AA4996C2E5782E10FBC601162E@exchange01.well.ox.ac.uk> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sandeep.coepcivil at gmail.com Mon Oct 3 18:16:20 2011 From: sandeep.coepcivil at gmail.com (Sandeep Patil) Date: Mon, 3 Oct 2011 11:16:20 -0500 Subject: [R] Installation from local Compiled directory Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From MikeP at kfoc.net Mon Oct 3 18:31:39 2011 From: MikeP at kfoc.net (Mike Pfeiff) Date: Mon, 3 Oct 2011 16:31:39 +0000 Subject: [R] read .csv from web from password protected site Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dyeager at gmail.com Mon Oct 3 19:07:18 2011 From: dyeager at gmail.com (davidyeager) Date: Mon, 3 Oct 2011 10:07:18 -0700 (PDT) Subject: [R] Meta-analysis of test statistics in "metafor" package? Message-ID: <1317661638576-3868126.post@n4.nabble.com> Hi - I am conducting a meta-analysis and I have a matrix of f-statistics, Ns and dfs from a series of studies that tested for an interaction in a 2x2 anova. I'd like to test whether the 2x2 interaction is significant in the aggregate. Similarly, I have a matrix of chi-square statistics that I'd like to meta-analyze. How can I input these test statistics into the "metafor" package and conduct these meta-analyses? I see how to input raw data or standardizes effect sizes (e.g., d) into "metafor", but not test statistics. Best, David -- View this message in context: http://r.789695.n4.nabble.com/Meta-analysis-of-test-statistics-in-metafor-package-tp3868126p3868126.html Sent from the R help mailing list archive at Nabble.com. From Sam.Cable at kirtland.af.mil Mon Oct 3 19:19:35 2011 From: Sam.Cable at kirtland.af.mil (Cable, Sam B Civ USAF AFMC AFRL/RVBXI) Date: Mon, 3 Oct 2011 11:19:35 -0600 Subject: [R] file input with readLines Message-ID: <39B5ED61E7BFC24FA8277B6DE92A9A3F0415418D@fkimlki01.enterprise.afmc.ds.af.mil> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dgou at mac.com Mon Oct 3 19:23:00 2011 From: dgou at mac.com (Douglas Philips) Date: Mon, 03 Oct 2011 13:23:00 -0400 Subject: [R] xts/time-series and plot questions... Message-ID: Hello, I'm a complete newbie to R. Spent this past weekend reading The Art of R Programming, The R Cookbook, the language spec, Wikis and FAQs. I sort-of have my head around R; the dizzying selection of libraries, packages, etc? Not really. I've probably missed or failed to understand something... I have very a simple data set. Two years (ish) of temperature data, collected and time-stamped every 10 minutes. Sample of the data from read.csv(...): > head(temp.data) DateTime Temperature 1 2009-11-23 23:20:00 62.9 2 2009-11-23 23:30:00 63.4 3 2009-11-23 23:40:00 63.6 4 2009-11-23 23:50:00 64.2 5 2009-11-24 00:00:00 64.5 6 2009-11-24 00:10:00 64.7 Converted to an xts object: > str(temp_data) xts [1:83089, 1] 62.9 63.4 63.6 64.2 64.5 64.7 65.2 65.3 65.8 65.6 ... - attr(*, "index")= atomic [1:83089] 1.26e+09 1.26e+09 1.26e+09 1.26e+09 1.26e+09 ... ..- attr(*, "tzone")= chr "" ..- attr(*, "tclass")= chr [1:2] "POSIXct" "POSIXt" - attr(*, "class")= chr [1:2] "xts" "zoo" - attr(*, ".indexCLASS")= chr [1:2] "POSIXct" "POSIXt" - attr(*, ".indexTZ")= chr "" So far so good! I can do all kinds of cool things like plot individual months: plot(temp_data["2009-12"]) and plot monthly, weekly, daily mean, sd, etc. (I have to say, xts has been a dream to work with!) What I would like to do is plot several sections of this data on the same graph. Specifically, I would like to plot all the data from one calendar month, regardless of year, on one plot. i.e. one line for Jan 2009, another line for Jan 2010, anothe line for Jan 2011, etc. I can use xts functions to slice the data into months (or weeks, or days), but I am not sure how to arrange to get the X-axis to work right. If I do: plot(temp_data["2010-01"]); lines(temp_data("2011-01")) lines aren't overlayed; the output from lines() is lost because it is far off of the right of the plot as the plot autoranged() the x and y axes. But I don't think xlim is my problem so much as I need a way to 'slide' temp_data["2011-01"] so that it will appear in the same part of the graph/plot as the 2010-01 data does. What I think I want to do is write a "normalizing" function that takes data for any given month and makes it "year-free"??? This way I could plot corresponding months on the same graph. One month, or a quarter, or even a full year. I don't know, however, how to convince xts to ignore the year, or if there is an xts compatible object that is year-free (or month-free for looking at week/day segements)... Pointers to online answers, google search terms, etc. greatly appreciated! Thanks, -=Doug From francy.casalino at gmail.com Mon Oct 3 19:54:47 2011 From: francy.casalino at gmail.com (francy) Date: Mon, 3 Oct 2011 10:54:47 -0700 (PDT) Subject: [R] Merge two data frames and find common values and non-matching values Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Sam.Cable at kirtland.af.mil Mon Oct 3 20:26:55 2011 From: Sam.Cable at kirtland.af.mil (Cable, Sam B Civ USAF AFMC AFRL/RVBXI) Date: Mon, 3 Oct 2011 12:26:55 -0600 Subject: [R] file input with readLines Message-ID: <39B5ED61E7BFC24FA8277B6DE92A9A3F041541B6@fkimlki01.enterprise.afmc.ds.af.mil> More on my previous question ... I have put in timing statements to try to get a better idea of where the problem is, like so: conn<-file('filename','r') for (chunk in 1:100000) { print(paste('begin read at',date())) Lines<-readLines(conn,n=25) print(paste('begin processing at',date())) # process "Lines" print(paste('end loop at',date())) } Every time I go through the loop, all the date() functions return *exactly* the same time! It *looks like* it runs through each iteration very quickly and then takes longer and longer to simply start the next iteration. I don't believe this. I think R must be doing some kind of latency trick or something. But, anyway, the point is that I was assuming the problem was in the I/O, and now I don't know if it's I/O or processing. Either way, I don't understand it and would really appreciate some wisdom from you guys. Thanks. From batholdy at googlemail.com Mon Oct 3 20:47:42 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Mon, 3 Oct 2011 20:47:42 +0200 Subject: [R] intensity map / plot In-Reply-To: <4E89ECDD.8070809@statistik.tu-dortmund.de> References: <1D088B40-E7EB-4B9E-A19F-97EC47732DEA@googlemail.com> <4E89ECDD.8070809@statistik.tu-dortmund.de> Message-ID: <1F56F14C-27EC-4B5A-8C3B-082CC74A147B@googlemail.com> thanks! On 03.10.2011, at 19:11, Uwe Ligges wrote: > See ?image > > Uwe Ligges > > > On 03.10.2011 19:05, Martin Batholdy wrote: >> Dear R-list, >> >> >> I would like to generate an intensity map based on a x * y matrix. >> >> Each point in the matrix should get plotted at the coordinate: x = column / y = row with >> a color-intensity (for example gray-value) based on the actual value of this point. >> >> Is there a convenient package / function for this kind of plot? >> >> >> >> example data: >> x<- matrix(rnorm(100*100, 0, 2), 100, 100) >> >> >> thanks! >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. From kevin.thorpe at utoronto.ca Mon Oct 3 20:54:51 2011 From: kevin.thorpe at utoronto.ca (Kevin E. Thorpe) Date: Mon, 03 Oct 2011 14:54:51 -0400 Subject: [R] Meta-analysis of test statistics in "metafor" package? In-Reply-To: <1317661638576-3868126.post@n4.nabble.com> References: <1317661638576-3868126.post@n4.nabble.com> Message-ID: <4E8A04FB.9020405@utoronto.ca> On 10/03/2011 01:07 PM, davidyeager wrote: > Hi - > > I am conducting a meta-analysis and I have a matrix of f-statistics, Ns and > dfs from a series of studies that tested for an interaction in a 2x2 anova. > I'd like to test whether the 2x2 interaction is significant in the > aggregate. > > Similarly, I have a matrix of chi-square statistics that I'd like to > meta-analyze. > > How can I input these test statistics into the "metafor" package and conduct > these meta-analyses? I see how to input raw data or standardizes effect > sizes (e.g., d) into "metafor", but not test statistics. > > Best, > > David > I believe the MAd package may have what you are looking for. It can call metafor for you. Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 From jianfeng.mao at gmail.com Mon Oct 3 21:25:27 2011 From: jianfeng.mao at gmail.com (Mao Jianfeng) Date: Mon, 3 Oct 2011 21:25:27 +0200 Subject: [R] generating Venn diagram with 6 sets In-Reply-To: <695BFFF8-7143-49FE-8A88-8DCA1B2C34B9@gmail.com> References: <695BFFF8-7143-49FE-8A88-8DCA1B2C34B9@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From wdunlap at tibco.com Mon Oct 3 21:25:35 2011 From: wdunlap at tibco.com (William Dunlap) Date: Mon, 3 Oct 2011 19:25:35 +0000 Subject: [R] Merge two data frames and find common values and non-matching values In-Reply-To: References: Message-ID: Start out with merge(): > df <- merge(df1, df2, all.x=TRUE) # could add by="location" for emphasis > df location Name Position Country 1 36 cristina B 2 75 francesca A UK You could make make your 'Match' column from is.na(df$Country) if you knew that df2$Country were never NA. Otherwise you can add a fake variable to the merge to tell which output rows come from unmatched rows in the first data.frame: > df12 <- merge(df1, cbind(df2, fromDF2=TRUE), all.x=TRUE, by="location") > df12$Match <- !is.na(df12$fromDF2) > df12 location Name Position Country fromDF2 Match 1 36 cristina B NA FALSE 2 75 francesca A UK TRUE TRUE Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of francy > Sent: Monday, October 03, 2011 10:55 AM > To: r-help at r-project.org > Subject: [R] Merge two data frames and find common values and non-matching values > > Hi, > > I am trying to find a function to match two data frames of different lengths > for one field only. > So, for example, > df1 is: > > Name Position location > francesca A 75 > cristina B 36 > > And df2 is: > > location Country > 75 UK > 56 Austria > > And I would like to match on "Location" and the output to be something like: > > Name Position Location Match > francesca A 75 1 > cristina B 36 0 > > I have tried with the function 'match' or with: > subset(df1, location %in% df2) > But it does not work. > > Could you please help me figure out how to do this? > > Thank you! > -f > > > -- > View this message in context: http://r.789695.n4.nabble.com/Merge-two-data-frames-and-find-common- > values-and-non-matching-values-tp3868299p3868299.html > Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From sarah.goslee at gmail.com Mon Oct 3 21:27:16 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 3 Oct 2011 15:27:16 -0400 Subject: [R] Merge two data frames and find common values and non-matching values In-Reply-To: References: Message-ID: Hi, On Mon, Oct 3, 2011 at 1:54 PM, francy wrote: > Hi, > > I am trying to find a function to match two data frames of different lengths > for one field only. > So, for example, > df1 is: > > Name Position location > francesca A 75 > cristina B 36 > > And df2 is: > > location Country > 75 UK > 56 Austria > > And I would like to match on "Location" and the output to be something like: Sounds like you need merge() (just as in your subject line!). > Name Position Location Match > francesca A 75 1 > cristina B 36 0 > > I have tried with the function 'match' or with: > subset(df1, location %in% df2) > But it does not work. > > Could you please help me figure out how to do this? > Sarah -- Sarah Goslee http://www.functionaldiversity.org From sarah.goslee at gmail.com Mon Oct 3 21:25:53 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 3 Oct 2011 15:25:53 -0400 Subject: [R] read .csv from web from password protected site In-Reply-To: References: Message-ID: Hi Mike, On Mon, Oct 3, 2011 at 12:31 PM, Mike Pfeiff wrote: > I am very new to R and have been struggling trying to read a basic ".csv" file from a password protected site with the following code: > > myURL ?="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt" > test2=read.table(url(myURL),header=TRUE,sep=",") > A 'data.frame' is returned into the workspace, however it is not the data contained in the ".csv" file. ? I think this occurs because the website where I am trying to retrieve the data is password protected. > > Is there a way to specify the username and password? I'd try first read.table("http://userid:password at my.url/file.csv"), which is the standard way to do it (hint: try that form in your web browser and see whether you can access the data), and if that doesn't work look into the RCurl package. The list archives have a fair bit of information on this topic. Sarah > Any guidance would be greatly appreciated. > > Sincerely, > > Mike > -- Sarah Goslee http://www.functionaldiversity.org From sarah.goslee at gmail.com Mon Oct 3 21:30:19 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 3 Oct 2011 15:30:19 -0400 Subject: [R] Import in R with White Spaces In-Reply-To: References: Message-ID: Hi, On Mon, Oct 3, 2011 at 11:14 AM, francy wrote: > Hi, > > I have a simple question about importing data, I would be very grateful if > you could help me out. > > I have used read.csv(file name, header=T, sep=",") to bring in a csv file I > saved in MS Excel.The problem is I have white spaces in the middle of values > (not in the column names), and this messes up the column entries. Since I > have many many files that I am importing and I have spaces in all of them, I > was looking for a way to avoid going into all of them and changing the white > spaec to, for example, an underscore. > Can you suggest whether there is a way to tell R that each element delimited > by "," is actually a different entry, regardless of whether there are white > spaces in between? Since you're exporting from Excel, make sure that quoting is turned on in the export options. That way an entire value, including white space, will be in quotes and thus read as an entire value. That should have been done by default, and specifying sep as you did should have worked. So actually, we may need more information beyond what you've provided. Sarah -- Sarah Goslee http://www.functionaldiversity.org From Thomas.Adams at noaa.gov Mon Oct 3 21:44:11 2011 From: Thomas.Adams at noaa.gov (Thomas Adams) Date: Mon, 03 Oct 2011 15:44:11 -0400 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: References: <4E89EFE3.5050804@noaa.gov> Message-ID: <4E8A108B.9070507@noaa.gov> Andr?s, Thank you for your help, but that does not capture what I'm looking for. I need to be able to control the shaded bound limits and they need to be coincident. Tom On 10/3/11 3:37 PM, Andr?s Arag?n wrote: > Hi, > Try some like this: > > c<- ggplot(mtcars, aes(qsec, mpg, colour=factor(cyl))) > c + stat_smooth(aes(group=cyl))+stat_smooth(aes(fill=factor(cyl)))+geom_point() > > > Andr?s AM > > > > 2011/10/3, Thomas Adams: >> I'm interested in creating a graphic -like- this: >> >> c<- ggplot(mtcars, aes(qsec, wt)) >> c + geom_point() + stat_smooth(fill="blue", colour="darkblue", size=2, >> alpha = 0.2) >> >> but I need to show 2 sets of bands (with different shading) using 5%, >> 25%, 75%, 95% limits that I specify and where the heavy blue line is the >> median. I don't understand how to do this with ggplot2. What I am doing >> currently is to generate 'boxplots' (with 5%, 25%, 75%, 95% limits) at >> 6-hourly time steps (so I have a series of boxplots, which you can see >> by clicking on a map point: >> http://www.erh.noaa.gov/mmefs/index_test.php?Lat=38.2&Lon=-80.1&Zoom=5&Refresh=0&RFCOverlay=0&Model=NAEFS). >> Some who use our graphics would like to see something more like the >> ggplot2 with stat_smooth graphic. >> >> Help is much appreciated. >> >> Regards, >> Tom >> >> -- >> Thomas E Adams >> National Weather Service >> Ohio River Forecast Center >> 1901 South State Route 134 >> Wilmington, OH 45177 >> >> EMAIL: thomas.adams at noaa.gov >> >> VOICE: 937-383-0528 >> FAX: 937-383-0033 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.adams at noaa.gov VOICE: 937-383-0528 FAX: 937-383-0033 From spencer.graves at structuremonitoring.com Mon Oct 3 21:48:27 2011 From: spencer.graves at structuremonitoring.com (Spencer Graves) Date: Mon, 03 Oct 2011 12:48:27 -0700 Subject: [R] Compact letter display for interaction effects In-Reply-To: <4E89D03F.30703@aesop.rutgers.edu> References: <4E89D03F.30703@aesop.rutgers.edu> Message-ID: <4E8A118B.60605@structuremonitoring.com> > library(sos) > cl <- ???'compact letter display' > cl This opened a table for me in a web browser identifying 3 help pages in 'multcomp' and 1 in 'multcompView' that mention 'compact letter display'. Functions in multcompView will work anything that looks like a distance or similarity matrix. You need to decide what to use so the results make sense. Hope this helps. Spencer On 10/3/2011 8:09 AM, Josh Caplan wrote: > Hello, > > I am interested in generating a compact letter display for the results > of Tukey HSD tests that contain interaction effects. The 'cld' method > in the 'multcomp' package seems only to work for main effects. Does > such a thing exist already? Thank you for any thoughts, > > Josh > -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com From joseph.g.boyer at gsk.com Mon Oct 3 21:46:40 2011 From: joseph.g.boyer at gsk.com (Joseph Boyer) Date: Mon, 3 Oct 2011 19:46:40 +0000 Subject: [R] suggestions argument in rbga function in genalg package In-Reply-To: <4E7C1090.6090102@yahoo.de> References: <4E7C1090.6090102@yahoo.de> Message-ID: Enrico, The idea of looking at the code never occurred to me. That's a great idea which will come in useful I'm sure. In this particular case the using list() or as.matrix() does not solve the problem, but those are both good ideas. Thanks for the reply. -----Original Message----- From: Enrico Schumann [mailto:enricoschumann at yahoo.de] Sent: Friday, September 23, 2011 12:53 AM To: Joseph Boyer Cc: r-help at r-project.org Subject: Re: [R] suggestions argument in rbga function in genalg package I do not use this package, but a quick look at the code shows this. if (!is.null(suggestions)) { # [deleted] suggestionCount = dim(suggestions)[1] So 'suggestions' needs to have a dim argument (while the documentation speaks of an 'optional list of suggested chromosomes'). You could try as.matrix(c(1,0.1,10, 100,1,100,1)) But I cannot tell if that solves your problem since you have not provided your objective function (ie, you have not provided the "commented, minimal, self-contained, reproducible code" that the footer of this message speaks about). Regards, Enrico Am 22.09.2011 20:46, schrieb Joseph Boyer: > Would someone be so kind as to provide example code where they use the suggestions argument in the rgba function > In genalg? I can't get it to work. > > The following code works just fine: > > GenFit<-rbga(Lower, Upper, evalFunc = evaluate) > > Lower and Upper are each numeric vectors with 7 elements. Evaluate is an objective function. > However, when I want to use a suggested chromosome, I get an error message. My code is > > start<- c(1,0.1,10, 100,1,100,1) > > suggestions<- list(start) > > GenFit<-rbga(Lower, Upper, suggestions = suggestions, evalFunc = evaluate) > > The error message is: > > Error in 1:suggestionCount : argument of length 0 > > Thanks. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Enrico Schumann Lucerne, Switzerland http://nmof.net/ From sarah.goslee at gmail.com Mon Oct 3 22:06:42 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 3 Oct 2011 16:06:42 -0400 Subject: [R] read .csv from web from password protected site In-Reply-To: References: Message-ID: Hi, I've assumed that you meant to send this to the R-help list, and not just me. On Mon, Oct 3, 2011 at 3:54 PM, Mike Pfeiff wrote: > Sarah, ?Thanks for the suggestion. ?Although, read.table("http://userid:password at my.url/file.csv") did not work as it returned the following: > > ? ? ? ?Error in file(file, "rt") : cannot open the connection > ? ? ? ?In addition: Warning message: > ? ? ? ?In file(file, "rt") : unable to resolve 'userid' > > (where 'usesid is my actual userid) > > I've tried the following RCurl commands... > > ? ? ? ?myURL ? ="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt" > ? ? ? ?h=getURL(myURL, userpw = "userid:passwod", followlocation=TRUE) > ? ? ? ?test=read.table(h,header=TRUE,sep=",") > > ..and I can't get the data to read and get the following errors: > > ? ? ? ?Error in file(file, "rt") : cannot open the connection > ? ? ? ?In addition: Warning message: > ? ? ? ?In file(file, "rt") : unable to resolve 'userid' > > I'm at a total loss. ?Any assistance anyone could provide would be greatly appreciated. You can get to the file using your web browser and that userid/password combo, right? Do you have to go through a dialog box? Press a button to login? Any of those have the potential to complicate the task. If so, you'll need to work through the Forms section at http://www.omegahat.org/RCurl/philosophy.html Sarah > > > -----Original Message----- > From: Sarah Goslee [mailto:sarah.goslee at gmail.com] > Sent: Monday, October 03, 2011 2:26 PM > To: Mike Pfeiff > Cc: r-help at r-project.org > Subject: Re: [R] read .csv from web from password protected site > > Hi Mike, > > On Mon, Oct 3, 2011 at 12:31 PM, Mike Pfeiff wrote: >> I am very new to R and have been struggling trying to read a basic ".csv" file from a password protected site with the following code: >> >> myURL ?="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt" >> test2=read.table(url(myURL),header=TRUE,sep=",") > > >> A 'data.frame' is returned into the workspace, however it is not the data contained in the ".csv" file. ? I think this occurs because the website where I am trying to retrieve the data is password protected. >> >> Is there a way to specify the username and password? > > > I'd try first > read.table("http://userid:password at my.url/file.csv"), which is the standard way to do it (hint: try that form in your web browser and see whether you can access the data), and if that doesn't work look into the RCurl package. The list archives have a fair bit of information on this topic. > > Sarah > > >> Any guidance would be greatly appreciated. >> >> Sincerely, >> >> Mike >> > -- Sarah Goslee http://www.functionaldiversity.org From diggsb at ohsu.edu Mon Oct 3 22:07:32 2011 From: diggsb at ohsu.edu (Brian Diggs) Date: Mon, 3 Oct 2011 13:07:32 -0700 Subject: [R] Returning vector of values shared across 3 vectors? In-Reply-To: References: <1317448802.24157.YahooMailNeo@web160705.mail.bf1.yahoo.com> Message-ID: <4E8A1604.8030902@ohsu.edu> On 10/1/2011 3:03 AM, jim holtman wrote: > try this: > >> vec1<- c(4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81) >> vec2<- c (1,2,3,4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66) >> vec3<- c (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,52) >> intersect(vec1,intersect(vec2, vec3)) > [1] 4 5 6 7 8 9 10 11 12 52 >> Or if your real problem may go to more vec's and you don't want to keep nesting the calls: Reduce(intersect, list(vec1, vec2, vec3)) # [1] 4 5 6 7 8 9 10 11 12 52 > > On Sat, Oct 1, 2011 at 2:00 AM, Chris Conner wrote: >> Help-Rs, >> >> I've got three vectors representing participants: >> >> vec1<- c(4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81) >> vec2<- c (1,2,3,4,5,6,7,8,9,10,11,12,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66) >> vec3<- c (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,52) >> >> I'd like to return a vector that contains only the values that are shared across ALL THREE vectors. So the statement would return a vector that looked like this: >> 4,5,6,7,8,9,10,11,12,52 >> >> For some reason I initially thought that a cbind and a unique() would handle it, but then common sense sunk in. I think the sleep deprivation is starting to take it's toll. I've got to believe that there is a simple solution to this dilema. >> >> Thanks in adance for any help! >> C >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University From michael.weylandt at gmail.com Mon Oct 3 22:11:38 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Mon, 3 Oct 2011 16:11:38 -0400 Subject: [R] file input with readLines In-Reply-To: <39B5ED61E7BFC24FA8277B6DE92A9A3F041541B6@fkimlki01.enterprise.afmc.ds.af.mil> References: <39B5ED61E7BFC24FA8277B6DE92A9A3F041541B6@fkimlki01.enterprise.afmc.ds.af.mil> Message-ID: If you are using rbind() at each iteration, that can slow things down greatly. Look up a document called the R Inferno which discusses this in great detail in circle 2. Michael Weylandt On Mon, Oct 3, 2011 at 2:26 PM, Cable, Sam B Civ USAF AFMC AFRL/RVBXI wrote: > More on my previous question ... > > I have put in timing statements to try to get a better idea of where the > problem is, like so: > > conn<-file('filename','r') > > for (chunk in 1:100000) { > ? print(paste('begin read at',date())) > ? Lines<-readLines(conn,n=25) > ? print(paste('begin processing at',date())) > ?# process "Lines" > ? print(paste('end loop at',date())) > } > > Every time I go through the loop, all the date() functions return > *exactly* the same time! ?It *looks like* it runs through each iteration > very quickly and then takes longer and longer to simply start the next > iteration. ?I don't believe this. ?I think R must be doing some kind of > latency trick or something. ?But, anyway, the point is that I was > assuming the problem was in the I/O, and now I don't know if it's I/O or > processing. ?Either way, I don't understand it and would really > appreciate some wisdom from you guys. > > Thanks. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From michael.weylandt at gmail.com Mon Oct 3 22:15:23 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Mon, 3 Oct 2011 16:15:23 -0400 Subject: [R] Parsing variable-length delimited strings into a matrix In-Reply-To: <0D642FB35639AA4996C2E5782E10FBC601162E@exchange01.well.ox.ac.uk> References: <0D642FB35639AA4996C2E5782E10FBC601162E@exchange01.well.ox.ac.uk> Message-ID: Well how do you want it be made into a matrix if the rows are all different lengths? Methinks you are finding this tricky for a reason... Michael On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright wrote: > > I'm struggling to find a way of parsing a vector of data in this sort of form: > > A,B,C > B,B > A,AA,C > A,B,BB,BBB,B,B > > into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes. > > Is there a reasonably quick method of achieving this in a vector-oriented way? > > Cheers, > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From michael.weylandt at gmail.com Mon Oct 3 22:27:53 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Mon, 3 Oct 2011 16:27:53 -0400 Subject: [R] distance coefficient for amatrix with ngative valus In-Reply-To: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> References: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> Message-ID: One order of the usual coming right up! 1 course of "Why does XXX not work for you?" a la francaise, where XXX is, in your case, the Euclidean distance. Specifically, any metric worth its salt (in a normed space) satisfies dist(a,b) = dist(a+c,b+c) so why are negative values a problem?... 2 sides: a "Minimal Working Example" with a light buttery sauce and a fried "what package/code are you using" and, for desert, a Winsemian special of: "read the posting guide!" Michael Weylandt, who is putting together a menu for a fancy dinner even as he types On Mon, Oct 3, 2011 at 12:55 PM, dilshan benaragama wrote: > Hi, > I need to run a PCoA (PCO) for a data set wich has both positive and negative values for variables. I? could not find any distancecoefficient other than euclidean distace running for the data set. Are there any other coefficient works with negtive values.Also I cannot get summary out put (the eigen values) for PCO as for PCA. > > Thanks. > Dilshan > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From josh.m.ulrich at gmail.com Mon Oct 3 22:55:02 2011 From: josh.m.ulrich at gmail.com (Joshua Ulrich) Date: Mon, 3 Oct 2011 15:55:02 -0500 Subject: [R] xts/time-series and plot questions... In-Reply-To: References: Message-ID: Hi Doug, Thanks for taking the time to write a great question. On Mon, Oct 3, 2011 at 12:23 PM, Douglas Philips wrote: > Hello, > ?I'm a complete newbie to R. Spent this past weekend reading The Art of R Programming, The R Cookbook, the language spec, Wikis and FAQs. I sort-of have my head around R; the dizzying selection of libraries, packages, etc? Not really. I've probably missed or failed to understand something... > > ?I have very a simple data set. Two years (ish) of temperature data, collected and time-stamped every 10 minutes. Sample of the data from read.csv(...): > >> head(temp.data) > ? ? ? ? ? ?DateTime Temperature > 1 2009-11-23 23:20:00 ? ? ? ?62.9 > 2 2009-11-23 23:30:00 ? ? ? ?63.4 > 3 2009-11-23 23:40:00 ? ? ? ?63.6 > 4 2009-11-23 23:50:00 ? ? ? ?64.2 > 5 2009-11-24 00:00:00 ? ? ? ?64.5 > 6 2009-11-24 00:10:00 ? ? ? ?64.7 > > Converted to an xts object: >> str(temp_data) > xts [1:83089, 1] 62.9 63.4 63.6 64.2 64.5 64.7 65.2 65.3 65.8 65.6 ... > - attr(*, "index")= atomic [1:83089] 1.26e+09 1.26e+09 1.26e+09 1.26e+09 1.26e+09 ... > ?..- attr(*, "tzone")= chr "" > ?..- attr(*, "tclass")= chr [1:2] "POSIXct" "POSIXt" > - attr(*, "class")= chr [1:2] "xts" "zoo" > - attr(*, ".indexCLASS")= chr [1:2] "POSIXct" "POSIXt" > - attr(*, ".indexTZ")= chr "" > > > So far so good! I can do all kinds of cool things like plot individual months: plot(temp_data["2009-12"]) and plot monthly, weekly, daily mean, sd, etc. (I have to say, xts has been a dream to work with!) > > What I would like to do is plot several sections of this data on the same graph. > Specifically, I would like to plot all the data from one calendar month, regardless of year, on one plot. > i.e. one line for ?Jan 2009, another line for Jan 2010, anothe line for Jan 2011, etc. > > I can use xts functions to slice the data into months (or weeks, or days), but I am not sure how to arrange to get the X-axis to work right. If I do: > ? plot(temp_data["2010-01"]); lines(temp_data("2011-01")) > lines aren't overlayed; the output from lines() is lost because it is far off of the right of the plot as the plot autoranged() the x and y axes. But I don't think xlim is my problem so much as I need a way to 'slide' temp_data["2011-01"] so that it will appear in the same part of the graph/plot as the 2010-01 data does. > > What I think I want to do is write a "normalizing" function that takes data for any given month and makes it "year-free"??? This way I could plot corresponding months on the same graph. One month, or a quarter, or even a full year. I don't know, however, how to convince xts to ignore the year, or if there is an xts compatible object that is year-free (or month-free for looking at week/day segements)... > xts requires a time-based index, so there's no way to make an index "year-free". What you can do, is split the xts object into years, convert all the index values to have the same year, and merge them together. The "toyear" function I've provided below converts the index values to a specific year. A "month-free" solution would be similar. I'd also recommend using plot.zoo for more complex graphs. toyear <- function(x, year) { # get year of last obs xyear <- .indexyear(last(x))+1900 # get index and convert to POSIXlt ind <- as.POSIXlt(index(x)) # set index year to desired value ind$year <- year-1900 index(x) <- ind # label column with year of last obs colnames(x) <- paste(colnames(x),xyear,sep=".") x } # split data into a list of xts objects by year tmp_dat_yr_list <- split(temp_data, "years") # convert each list element to be "2011" tmp_dat_yr_list <- lapply(tmp_dat_yr_list, toyear, 2011) # merge all list elements into one object temp_data_by_year <- do.call(merge, tmp_dat_yr_list) # plot.zoo has more features than plot.xts at the moment plot.zoo(temp_data_by_year, screens=1, col=rainbow(ncol(temp_data_by_year))) > Pointers to online answers, google search terms, etc. greatly appreciated! > > Thanks, > ?-=Doug > Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com From John.Morrongiello at csiro.au Tue Oct 4 00:24:51 2011 From: John.Morrongiello at csiro.au (John.Morrongiello at csiro.au) Date: Tue, 4 Oct 2011 09:24:51 +1100 Subject: [R] new standardised variable based on group membership In-Reply-To: References: Message-ID: That works a treat Thierry, thanks! I wasn't aware of the plyr package but I like what it does- I'll put it to use work in the future. Regards John -----Original Message----- From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be] Sent: Monday, 3 October 2011 6:36 PM To: Morrongiello, John (CMAR, Hobart); r-help at r-project.org Subject: RE: [R] new standardised variable based on group membership Dear John, You need to combine scale with a grouping function. data(Orange) library(plyr) Orange <- ddply(Orange, .(Tree), function(x){ x$ddplyAge <- scale(x$age)[, 1] x }) Orange$aveAge <- ave(Orange$age, by = Orange$Tree, FUN = scale) all.equal(Orange$ddplyAge, Orange$aveAge) Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens John.Morrongiello at csiro.au > Verzonden: maandag 3 oktober 2011 7:34 > Aan: r-help at r-project.org > Onderwerp: [R] new standardised variable based on group membership > > Hi > I have a data comprised of repeated measures of growth (5-15 records per > individual) for 580 fish (similar to Orange dataset from nlme library). I would like > to standardise these growth measures (yi ? ?/sd) using mean and standard > deviation unique to each fish. Can someone suggest a function that would help > me do this? I?ve had a look at scale and sweep but can?t find a worked example > that does what I?m after > > Cheers > > John > > > [[alternative HTML version deleted]] From baptiste.auguie at googlemail.com Tue Oct 4 00:56:18 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Tue, 4 Oct 2011 11:56:18 +1300 Subject: [R] new standardised variable based on group membership In-Reply-To: References: Message-ID: More concisely, ddply(Orange, .(Tree), transform, scaled = scale(age)) HTH, baptiste On 4 October 2011 11:24, wrote: > That works a treat Thierry, thanks! I wasn't aware of the plyr package but I like what it does- I'll put it to use work in the future. > > Regards > > John > > -----Original Message----- > From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be] > Sent: Monday, 3 October 2011 6:36 PM > To: Morrongiello, John (CMAR, Hobart); r-help at r-project.org > Subject: RE: [R] new standardised variable based on group membership > > Dear John, > > You need to combine scale with a grouping function. > > data(Orange) > library(plyr) > Orange <- ddply(Orange, .(Tree), function(x){ > ? ? ? ?x$ddplyAge <- scale(x$age)[, 1] > ? ? ? ?x > }) > > Orange$aveAge <- ave(Orange$age, by = Orange$Tree, FUN = scale) > > all.equal(Orange$ddplyAge, Orange$aveAge) > > Best regards, > > Thierry > > >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens John.Morrongiello at csiro.au >> Verzonden: maandag 3 oktober 2011 7:34 >> Aan: r-help at r-project.org >> Onderwerp: [R] new standardised variable based on group membership >> >> Hi >> I have a data comprised of repeated measures of growth (5-15 records per >> individual) for 580 fish (similar to Orange dataset from nlme library). I would like >> to standardise these growth measures (yi ? ?/sd) using mean and standard >> deviation unique to each fish. Can someone suggest a function that would help >> me do this? I?ve had a look at scale and sweep but can?t find a worked example >> that does what I?m after >> >> Cheers >> >> John >> >> >> ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From John.Morrongiello at csiro.au Tue Oct 4 01:01:45 2011 From: John.Morrongiello at csiro.au (John.Morrongiello at csiro.au) Date: Tue, 4 Oct 2011 10:01:45 +1100 Subject: [R] new standardised variable based on group membership In-Reply-To: References: Message-ID: I like that one too Baptiste, thanks -----Original Message----- From: baptiste auguie [mailto:baptiste.auguie at googlemail.com] Sent: Tuesday, 4 October 2011 9:56 AM To: Morrongiello, John (CMAR, Hobart) Cc: Thierry.ONKELINX at inbo.be; r-help at r-project.org Subject: Re: [R] new standardised variable based on group membership More concisely, ddply(Orange, .(Tree), transform, scaled = scale(age)) HTH, baptiste On 4 October 2011 11:24, wrote: > That works a treat Thierry, thanks! I wasn't aware of the plyr package but I like what it does- I'll put it to use work in the future. > > Regards > > John > > -----Original Message----- > From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be] > Sent: Monday, 3 October 2011 6:36 PM > To: Morrongiello, John (CMAR, Hobart); r-help at r-project.org > Subject: RE: [R] new standardised variable based on group membership > > Dear John, > > You need to combine scale with a grouping function. > > data(Orange) > library(plyr) > Orange <- ddply(Orange, .(Tree), function(x){ > ? ? ? ?x$ddplyAge <- scale(x$age)[, 1] > ? ? ? ?x > }) > > Orange$aveAge <- ave(Orange$age, by = Orange$Tree, FUN = scale) > > all.equal(Orange$ddplyAge, Orange$aveAge) > > Best regards, > > Thierry > > >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens John.Morrongiello at csiro.au >> Verzonden: maandag 3 oktober 2011 7:34 >> Aan: r-help at r-project.org >> Onderwerp: [R] new standardised variable based on group membership >> >> Hi >> I have a data comprised of repeated measures of growth (5-15 records per >> individual) for 580 fish (similar to Orange dataset from nlme library). I would like >> to standardise these growth measures (yi ? ?/sd) using mean and standard >> deviation unique to each fish. Can someone suggest a function that would help >> me do this? I?ve had a look at scale and sweep but can?t find a worked example >> that does what I?m after >> >> Cheers >> >> John >> >> >> ? ? ? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From camelbbs at gmail.com Tue Oct 4 04:08:06 2011 From: camelbbs at gmail.com (chunjiang he) Date: Mon, 3 Oct 2011 21:08:06 -0500 Subject: [R] a question about sort and BH Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From camelbbs at gmail.com Tue Oct 4 04:09:36 2011 From: camelbbs at gmail.com (chunjiang he) Date: Mon, 3 Oct 2011 21:09:36 -0500 Subject: [R] Hi In-Reply-To: <45294E56-D201-4B85-82F4-E9ACE208196C@sequentainc.com> References: <9CCFD2B2-373B-40B1-A57E-80845AB06977@sequentainc.com> <45294E56-D201-4B85-82F4-E9ACE208196C@sequentainc.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Tue Oct 4 05:00:53 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Mon, 3 Oct 2011 23:00:53 -0400 Subject: [R] distance coefficient for amatrix with ngative valus In-Reply-To: <1317691425.56813.YahooMailNeo@web65904.mail.ac4.yahoo.com> References: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> <1317691425.56813.YahooMailNeo@web65904.mail.ac4.yahoo.com> Message-ID: You still haven't explained what's wrong with *almost every metric there is*, but if you want other distance metrics have you considered those in the package you are using, via the function dsvdis(). Consider, for example: library(labdsv) X <- get(data(bryceveg)); X[, sample(NROW(X))] <- (-1)*X[, sample(NROW(X))] # Put some negative values in all willy nilly like.... Y <- pco( dsvdis(X, index="bray/curtis") ) print(any(X < 0)) If you want more explanation, please provide actual details of what you are asking, as requested in my first email. Michael Weylandt On Mon, Oct 3, 2011 at 9:23 PM, dilshan benaragama wrote: > I am using (labdsv). If I can use euclidean distance I can do it with PCA > instead of PCO, so I am trying an alternative to PCA, but I cannot find a > disimilarity coefficient for that. > > From: R. Michael Weylandt > To: dilshan benaragama ; r-help > > Sent: Monday, October 3, 2011 3:27:53 PM > Subject: Re: [R] distance coefficient for amatrix with ngative valus > > One order of the usual coming right up! > > 1 course of "Why does XXX not work for you?" a la francaise, where XXX > is, in your case, the Euclidean distance.? Specifically, any metric > worth its salt (in a normed space) satisfies dist(a,b) = dist(a+c,b+c) > so why are negative values a problem?... > > 2 sides: a "Minimal Working Example" with a light buttery sauce and a > fried "what package/code are you using" > > and, for desert, a Winsemian special of: "read the posting guide!" > > Michael Weylandt, who is putting together a menu for a fancy dinner > even as he types > > On Mon, Oct 3, 2011 at 12:55 PM, dilshan benaragama > wrote: >> Hi, >> I need to run a PCoA (PCO) for a data set wich has both positive and >> negative values for variables. I? could not find any distancecoefficient >> other than euclidean distace running for the data set. Are there any other >> coefficient works with negtive values.Also I cannot get summary out put (the >> eigen values) for PCO as for PCA. >> >> Thanks. >> Dilshan >> ? ? ? ?[[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > From michael.weylandt at gmail.com Tue Oct 4 06:05:19 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Tue, 4 Oct 2011 00:05:19 -0400 Subject: [R] distance coefficient for amatrix with ngative valus In-Reply-To: <1317698873.20366.YahooMailNeo@web65906.mail.ac4.yahoo.com> References: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> <1317691425.56813.YahooMailNeo@web65904.mail.ac4.yahoo.com> <1317698873.20366.YahooMailNeo@web65906.mail.ac4.yahoo.com> Message-ID: Comments inline: On Mon, Oct 3, 2011 at 11:27 PM, dilshan benaragama wrote: > Yes I think you did not get my problem. No, you did not state your problem. I have replied to everything you have actually included to this point. Admittedly, I have failed to reply to things you did not say... > Actualy I want run PCO with > (labdsv). To do that I I am trying to get the distance metrix using > following fuctions with library (vegan). This is now the 7th email in this chain. You should mention the packages and functions you are using in the FIRST email of the chain. This is mentioned in the posting guide which you apparently have still not yet read. > > pca.gower<- vegdist(envt[,2:9],method="gower") > pca.eucl<-vegdist(envt[,2:9],method="euclidean") > pca.chi<-vegdist(envt[,2:9],method="chi.square") > pca.mahal<-vegdist(envt[,2:9],method="mahal") > pca.bray<-vegdist(envt,method="bray") > > However none of the functions work They all work for any data I put in. This is perhaps when that minimal working example, which you also should have included, is necessary. The append at the end of each of the 7 emails in this chain that tells you to read the posting guide also asks for this, as did I explicitly. > (gives an error saying that is not > working due to negatve values) No, they each give warnings. Warnings are not errors. They are warnings and they say "warning". Perhaps unsurprisingly, errors say "error". If you are using an old version of vegan that throws an error, you should always update before seeking help.Not surprisingly, a certain document suggests this. > except euclidean distance for the raw data > set as the raw data has negative values for some variables. It is no point > of using euclidean metrix with?PCO as we can do the same thing from PCA. So > I need to find a way I can run PCO with a different dissimilarity metrix > for this data. It will be a great help if you can help me on this Actually read the warning message: it warns you that you have given negative data to an ecological function and suggests this might be a point you look into as this usually suggests a user-end problem. It does not fail to work in any sense of the word as evidence by the output of distances. If negative data is nonsense, you should heed this warning; if you know its not, disregard it. More importantly, as I said in my initial response, any distance metric worth its salt is translation invariant. To wit, x <- matrix(rnorm(50),5) d1 = vegdist(x, method="gower") d2 = vegdist(x + abs(min(x))*3, method="gower") all.equal(as.numeric(d1), as.numeric(d2)) TRUE In fairness, I'll admit this does not seem to work for the bray distance. I am not an ecologist and I do not know why this would be -- it does leave me somewhat confused as to what sort of space motivates the bray metric, but that's a discussion for another time and place -- but the function still returns a valid dist object for both d1 and d2. > > Thanks, > From: R. Michael Weylandt > To: dilshan benaragama ; r-help > You will note that I include the r-help list on each email on this chain while you have not; this is mentioned in the posting guide. > Sent: Monday, October 3, 2011 10:00:53 PM > Subject: Re: [R] distance coefficient for amatrix with ngative valus > > You still haven't explained what's wrong with *almost every metric > there is*, but if you want other distance metrics have you considered > those in the package you are using, via the function dsvdis(). > Consider, for example: > > library(labdsv) > > X <- get(data(bryceveg)); > > X[, sample(NROW(X))] <- (-1)*X[, sample(NROW(X))] # Put some negative > values in all willy nilly like.... > Y <- pco( dsvdis(X, index="bray/curtis") ) > print(any(X < 0)) > > If you want more explanation, please provide actual details of what > you are asking, as requested in my first email. > > Michael Weylandt > > On Mon, Oct 3, 2011 at 9:23 PM, dilshan benaragama > wrote: >> I am using (labdsv). If I can use euclidean distance I can do it with PCA >> instead of PCO, so I am trying an alternative to PCA, but I cannot find a >> disimilarity coefficient for that. >> >> From: R. Michael Weylandt >> To: dilshan benaragama ; r-help >> >> Sent: Monday, October 3, 2011 3:27:53 PM >> Subject: Re: [R] distance coefficient for amatrix with ngative valus >> >> One order of the usual coming right up! >> >> 1 course of "Why does XXX not work for you?" a la francaise, where XXX >> is, in your case, the Euclidean distance.? Specifically, any metric >> worth its salt (in a normed space) satisfies dist(a,b) = dist(a+c,b+c) >> so why are negative values a problem?... >> >> 2 sides: a "Minimal Working Example" with a light buttery sauce and a >> fried "what package/code are you using" >> >> and, for desert, a Winsemian special of: "read the posting guide!" >> >> Michael Weylandt, who is putting together a menu for a fancy dinner >> even as he types >> >> On Mon, Oct 3, 2011 at 12:55 PM, dilshan benaragama >> wrote: >>> Hi, >>> I need to run a PCoA (PCO) for a data set wich has both positive and >>> negative values for variables. I? could not find any distancecoefficient >>> other than euclidean distace running for the data set. Are there any >>> other >>> coefficient works with negtive values.Also I cannot get summary out put >>> (the >>> eigen values) for PCO as for PCA. >>> >>> Thanks. >>> Dilshan >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> > > > Would you care to elaborate further as to what the actual problem entails, with a minimal working example? More generally, might I suggest you learn how these metrics work and then apply the most appropriate one rather than groping blindly after something solely on the criterion of it being non-Euclidean. If you need other metrics, look into the various p-norms, all of which are implemented directly in R by way of the dist() function as are a few other norms with which I am not immediately familiar. Regards, Michael Weylandt From armandres at gmail.com Mon Oct 3 21:37:57 2011 From: armandres at gmail.com (=?ISO-8859-1?Q?Andr=E9s_Arag=F3n?=) Date: Mon, 3 Oct 2011 14:37:57 -0500 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: <4E89EFE3.5050804@noaa.gov> References: <4E89EFE3.5050804@noaa.gov> Message-ID: Hi, Try some like this: c <- ggplot(mtcars, aes(qsec, mpg, colour=factor(cyl))) c + stat_smooth(aes(group=cyl))+stat_smooth(aes(fill=factor(cyl)))+geom_point() Andr?s AM 2011/10/3, Thomas Adams : > I'm interested in creating a graphic -like- this: > > c <- ggplot(mtcars, aes(qsec, wt)) > c + geom_point() + stat_smooth(fill="blue", colour="darkblue", size=2, > alpha = 0.2) > > but I need to show 2 sets of bands (with different shading) using 5%, > 25%, 75%, 95% limits that I specify and where the heavy blue line is the > median. I don't understand how to do this with ggplot2. What I am doing > currently is to generate 'boxplots' (with 5%, 25%, 75%, 95% limits) at > 6-hourly time steps (so I have a series of boxplots, which you can see > by clicking on a map point: > http://www.erh.noaa.gov/mmefs/index_test.php?Lat=38.2&Lon=-80.1&Zoom=5&Refresh=0&RFCOverlay=0&Model=NAEFS). > Some who use our graphics would like to see something more like the > ggplot2 with stat_smooth graphic. > > Help is much appreciated. > > Regards, > Tom > > -- > Thomas E Adams > National Weather Service > Ohio River Forecast Center > 1901 South State Route 134 > Wilmington, OH 45177 > > EMAIL: thomas.adams at noaa.gov > > VOICE: 937-383-0528 > FAX: 937-383-0033 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From salmi.hassani at gmail.com Mon Oct 3 21:52:45 2011 From: salmi.hassani at gmail.com (AHmed) Date: Mon, 3 Oct 2011 19:52:45 +0000 Subject: [R] ROC plot for KNN References: Message-ID: Qian Liu gmail.com> writes: > > Hi I need some help with ploting the ROC for K-nearest neighbors. Since KNN > is a non-parametric classification methods, the predicted value will be > either 0 or 1. > It will not be able to test for different cutoff to plot ROC. What is the > package or functions I should use to plot ROC for KNN? > > Thanks. > Qian > > [[alternative HTML version deleted]] > > I m wondering if you found any help or any codes to plot roc curve Thanks. Salmi From MikeP at kfoc.net Mon Oct 3 22:38:24 2011 From: MikeP at kfoc.net (Mike Pfeiff) Date: Mon, 3 Oct 2011 20:38:24 +0000 Subject: [R] read .csv from web from password protected site In-Reply-To: References: Message-ID: Yes, I meant to reply to all (sorry still new at asking for help) 1. No, I am not able to open the file when I insert "userID:password@" between "http://" and "www" http://userid:password at www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt (replacing userid and password with my actual information that is known to work) 2. Yes, the webpage where the data is stored does require me to my userid and password and gives me the option to remember password for future use which I have selected. Subsequent visits to the website do not require me to reenter info. -----Original Message----- From: Sarah Goslee [mailto:sarah.goslee at gmail.com] Sent: Monday, October 03, 2011 3:07 PM To: Mike Pfeiff; r-help Subject: Re: [R] read .csv from web from password protected site Hi, I've assumed that you meant to send this to the R-help list, and not just me. On Mon, Oct 3, 2011 at 3:54 PM, Mike Pfeiff wrote: > Sarah, ?Thanks for the suggestion. ?Although, read.table("http://userid:password at my.url/file.csv") did not work as it returned the following: > > ? ? ? ?Error in file(file, "rt") : cannot open the connection > ? ? ? ?In addition: Warning message: > ? ? ? ?In file(file, "rt") : unable to resolve 'userid' > > (where 'usesid is my actual userid) > > I've tried the following RCurl commands... > > ? ? ? ?myURL ? ="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt" > ? ? ? ?h=getURL(myURL, userpw = "userid:passwod", followlocation=TRUE) > ? ? ? ?test=read.table(h,header=TRUE,sep=",") > > ..and I can't get the data to read and get the following errors: > > ? ? ? ?Error in file(file, "rt") : cannot open the connection > ? ? ? ?In addition: Warning message: > ? ? ? ?In file(file, "rt") : unable to resolve 'userid' > > I'm at a total loss. ?Any assistance anyone could provide would be greatly appreciated. You can get to the file using your web browser and that userid/password combo, right? Do you have to go through a dialog box? Press a button to login? Any of those have the potential to complicate the task. If so, you'll need to work through the Forms section at http://www.omegahat.org/RCurl/philosophy.html Sarah > > > -----Original Message----- > From: Sarah Goslee [mailto:sarah.goslee at gmail.com] > Sent: Monday, October 03, 2011 2:26 PM > To: Mike Pfeiff > Cc: r-help at r-project.org > Subject: Re: [R] read .csv from web from password protected site > > Hi Mike, > > On Mon, Oct 3, 2011 at 12:31 PM, Mike Pfeiff wrote: >> I am very new to R and have been struggling trying to read a basic ".csv" file from a password protected site with the following code: >> >> myURL ?="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt" >> test2=read.table(url(myURL),header=TRUE,sep=",") > > >> A 'data.frame' is returned into the workspace, however it is not the data contained in the ".csv" file. ? I think this occurs because the website where I am trying to retrieve the data is password protected. >> >> Is there a way to specify the username and password? > > > I'd try first > read.table("http://userid:password at my.url/file.csv"), which is the standard way to do it (hint: try that form in your web browser and see whether you can access the data), and if that doesn't work look into the RCurl package. The list archives have a fair bit of information on this topic. > > Sarah > > >> Any guidance would be greatly appreciated. >> >> Sincerely, >> >> Mike >> > -- Sarah Goslee http://www.functionaldiversity.org From xenon99 at hotmail.com Tue Oct 4 00:44:49 2011 From: xenon99 at hotmail.com (Darius H) Date: Mon, 3 Oct 2011 22:44:49 +0000 Subject: [R] rolling regression In-Reply-To: References: , Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From aurelien.philippot at gmail.com Tue Oct 4 01:01:45 2011 From: aurelien.philippot at gmail.com (=?ISO-8859-1?Q?Aur=E9lien_PHILIPPOT?=) Date: Tue, 4 Oct 2011 01:01:45 +0200 Subject: [R] Efficient way to do a merge in R Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From s_fallahpour at yahoo.com Tue Oct 4 01:20:59 2011 From: s_fallahpour at yahoo.com (saber fallahpour) Date: Mon, 3 Oct 2011 16:20:59 -0700 (PDT) Subject: [R] Quasi-Binomial simulation Message-ID: <1317684059.72627.YahooMailNeo@web38308.mail.mud.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From emayssat at gmail.com Tue Oct 4 01:48:59 2011 From: emayssat at gmail.com (Emmanuel Mayssat) Date: Mon, 3 Oct 2011 16:48:59 -0700 Subject: [R] cannot install.packages("data.table") Message-ID: Hello, I am new at R. I am trying to see if R can work for me. I need to do database like lookup (select * from table where name=='toto') and work with matrix (transpose, add columns, remove rows, etc). It seems that the data.table package can help. http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table I installed R and ... > install.packages("data.table") Warning in install.packages("data.table") : argument 'lib' is missing: using '/usr/local/lib/R/site-library' Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ?data.table? is not available > install.packages() doesn't show the package. where can I find it? -- Emmanuel From xn8spicer at gmail.com Tue Oct 4 05:12:21 2011 From: xn8spicer at gmail.com (Jeanne M. Spicer) Date: Mon, 3 Oct 2011 23:12:21 -0400 Subject: [R] inconsistent behavior of summary function Message-ID: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tyler_rinker at hotmail.com Mon Oct 3 22:02:04 2011 From: tyler_rinker at hotmail.com (Tyler Rinker) Date: Mon, 3 Oct 2011 16:02:04 -0400 Subject: [R] Import in R with White Spaces In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From smartguy3k at gmail.com Tue Oct 4 06:19:51 2011 From: smartguy3k at gmail.com (Smart Guy) Date: Tue, 4 Oct 2011 09:49:51 +0530 Subject: [R] The use of period in function names and variable names Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mailinglist.honeypot at gmail.com Tue Oct 4 06:30:06 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 4 Oct 2011 00:30:06 -0400 Subject: [R] The use of period in function names and variable names In-Reply-To: References: Message-ID: Hi, On Tue, Oct 4, 2011 at 12:19 AM, Smart Guy wrote: > Hi, > ? ? I am looking for some guidance on whether I can use the period(.) in > function names and variable names. Yes you can. > For example: > > my.function.name <- function(my.data.variable, my.radius, my.another.var, > my.value = 10) > { > > } > > Will this pose any problems regarding older and current version of R. Not really. For the most part, you can use this pattern as you like. One thing to keep in mind is that S3 method dispatch matches methods based on a METHOD_NAME.OBJECT_CLASS pattern. If you want to know more about the S3 stuff, you can start at ?UseMethod. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From rolf.turner at xtra.co.nz Tue Oct 4 08:06:50 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Tue, 04 Oct 2011 19:06:50 +1300 Subject: [R] distance coefficient for amatrix with ngative valus In-Reply-To: References: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> <1317691425.56813.YahooMailNeo@web65904.mail.ac4.yahoo.com> <1317698873.20366.YahooMailNeo@web65906.mail.ac4.yahoo.com> Message-ID: <4E8AA27A.5050104@xtra.co.nz> On 04/10/11 17:05, R. Michael Weylandt wrote: > More importantly, as I said in my initial response, any distance > metric worth its salt is translation invariant. Point of order, Mr. Chairman. (This is really *toadally* off topic; my apologies, but I couldn't resist --- I trained as a pure mathematician). A *metric* need not in general be translation invariant. Indeed a metric need not be defined on a space in which translation makes any sense. A metric defined in terms of a *norm* (on a normed vector space) by rho(x,y) = ||x - y|| is of course by definition translation invariant, and that's what most of us think in terms of. But there are perfectly ``reasonable'' metrics, defined on vector spaces, which are not translation invariant. Whether these are ``worth their salt'' is I suppose a matter of taste. (You should pardon the expression. :-) ) A simple e.g. of a non-translation-invariant metric is rho(x,y) = |x - y|/(1 + |x| + |y|) (defined on the real line). It is easily checked that rho(.,.) satisfies the four conditions that a metric must satisfy. (Exercise for the interested reader.) Note that rho(1,2) = 1/4 but rho(2,3) = 1/6, ergo not translation invariant. cheers, Rolf Turner From pdalgd at gmail.com Tue Oct 4 08:28:46 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 4 Oct 2011 08:28:46 +0200 Subject: [R] generating Venn diagram with 6 sets In-Reply-To: References: <695BFFF8-7143-49FE-8A88-8DCA1B2C34B9@gmail.com> Message-ID: <52057FB0-83D8-40A2-B693-77E347F7C872@gmail.com> On Oct 3, 2011, at 21:25 , Mao Jianfeng wrote: > Dear Peter, > > I am glad to hearing your reply. That is really nice. Thanks a lot. > > > ####################################### > # (1) the problem of the plot venneuler generated me is sets (A,B,C,D,E,F) should shared 69604 elements. > # But, it illustrated nothing for me for this 6 sets sharing. > > > > But, vennerable can not be installed on my Mac book. > > Works for me. What are the symptoms? > > > ###################################### > # (2) I compiled vennerable package, and then installed in my R-2.13.0. But the plot can only generated 5 sets, and looks not good. > > I have not saved the codes I tested. Could you please show me your codes? or just show me the plot you generated. You said that it couldn't be installed, I just tried installing it (from the R-forge binary). I did not attempt to solve your problem. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From fernando.cabrera at nordea.com Tue Oct 4 08:36:59 2011 From: fernando.cabrera at nordea.com (fernando.cabrera at nordea.com) Date: Tue, 4 Oct 2011 09:36:59 +0300 Subject: [R] Matrix/Vector manipulation In-Reply-To: <9DE405308A6AA24AA794B76282C6C00F0A069BF445@HQ-POST1> References: <9DE405308A6AA24AA794B76282C6C00F0A069BF445@HQ-POST1> Message-ID: Stylish, but ifelse only includes a cumsum less or equal than v and ignores the remainder, if v does not fit equally in say the first two weight buckets. > R <- c(1.2, 1.3, 1.5) > W <- c(3,2,5) > my_cumsum(4, R, W) # should take 3*1.2 + 1*1.3 [1] 4.0 > sum(ifelse(cumsum(W) <= 4, W, 0) * R) # ignores the 1*1.3 part because 3+2 > 4 [1] 3.6 Cheers, Fer -----Original Message----- From: David Reiner [mailto:David.Reiner at xrtrading.com] Sent: 3. oktober 2011 17:57 To: Cabrera, Fernando ?lvarez; r-help at r-project.org Subject: RE: [R] Matrix/Vector manipulation sum(ifelse(cumsum(W)<=v, W, 0) * R) HTH, David L. Reiner -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of fernando.cabrera at nordea.com Sent: Monday, October 03, 2011 9:50 AM To: r-help at r-project.org Subject: [SPAM] - [R] Matrix/Vector manipulation - Bayesian Filter detected spam Hi guys, Have the following problem computing vectors with pure vector algebra and end up reverting to recursion or for-looping. Function my_cumsum calculates a weighted average (W) of ratios (R), but only up to the given size/volume (v). Now I recurse into the vector (from left to right) with what you have left from the difference of volume minus current weight, and stop when the difference is less than or equal to the current weight. Vectors W and R have the same length, and v is always a positive integer. W: {w_1 w_2 .. w_m} R: {r_1 r_2 .. r_m} my_cumsum <- function(v, R, W) { if (v <= W[1]) # check the head v*R[1] else W[1]*R[1] + my_cumsum(v - W[1], R[2:length(R)], W[2:length(W)]) # recurse the tail } Any help is greatly appreciated! Fernando Alvarez "Great ideas originate in the muscles." ~ Thomas A. Edison ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. From daniel at umd.edu Tue Oct 4 08:48:02 2011 From: daniel at umd.edu (Daniel Malter) Date: Mon, 3 Oct 2011 23:48:02 -0700 (PDT) Subject: [R] how do i put two scatterplots on same graph In-Reply-To: <1317709183813-3870030.post@n4.nabble.com> References: <1317709183813-3870030.post@n4.nabble.com> Message-ID: <1317710882638-3870074.post@n4.nabble.com> ?plot ?points You will probably need to get some R basics down as to how to index certain subsets of your data. This you find in any introductory R manual. HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870074.html Sent from the R help mailing list archive at Nabble.com. From sitinormah.hasan at gmail.com Tue Oct 4 06:37:38 2011 From: sitinormah.hasan at gmail.com (normah) Date: Mon, 3 Oct 2011 21:37:38 -0700 (PDT) Subject: [R] how to make ARFIMA forecast by using r? Message-ID: <1317703058179-3869928.post@n4.nabble.com> please help.. I have estimate the value of parameter for AR,MA and fractional d.but I have problem on having the right command for forecasting ARFIMA model.please help...... -- View this message in context: http://r.789695.n4.nabble.com/how-to-make-ARFIMA-forecast-by-using-r-tp3869928p3869928.html Sent from the R help mailing list archive at Nabble.com. From dgou at mac.com Tue Oct 4 07:38:01 2011 From: dgou at mac.com (Douglas Philips) Date: Tue, 04 Oct 2011 01:38:01 -0400 Subject: [R] xts/time-series and plot questions... In-Reply-To: References: Message-ID: <613C0EBC-063A-49CE-B75E-234F55BEB3DF@mac.com> On 2011 Oct 3, at 4:55 PM, Joshua Ulrich wrote: > ts requires a time-based index, so there's no way to make an index > "year-free". What you can do, is split the xts object into years, > convert all the index values to have the same year, and merge them > together. Ah... of course. I had read the vignette for xts, but didn't notice the indexing capabilities. Thank you. > ded below converts the index values to > a specific year. A "month-free" solution would be similar. I'd also > recommend using plot.zoo for more complex graphs. Great! I was able to plot the comparative data quite easily with that code. You mentioned "month-free" solution being similar, but I am not sure about that part. Specifically, one thing I want to do is compare data by month, but I also want to see that comparison preserve the day of the week. For example, 2011/10/01 is a Saturday, but 2010/10/01 is a Friday. When I see both on the same plot, I want to see them lined up by days of week, rather than have the first day for each at the same place on the graph. (I'm not sure I'm communicating this well, I hope this is making sense). Thank you again! --Doug From jricci at corcare.net Tue Oct 4 08:19:43 2011 From: jricci at corcare.net (jricci) Date: Mon, 3 Oct 2011 23:19:43 -0700 (PDT) Subject: [R] how do i put two scatterplots on same graph Message-ID: <1317709183813-3870030.post@n4.nabble.com> Have two sets of scatterplot data hypothetically a) stem lenght vs number of petals in red flowers b) stem lenght vs number of petals in white flowers want to place on same scatter plot with same x,y axis but different collored markers How do I do this in R -- View this message in context: http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html Sent from the R help mailing list archive at Nabble.com. From divyamurali13 at gmail.com Tue Oct 4 08:53:43 2011 From: divyamurali13 at gmail.com (Divyam) Date: Mon, 3 Oct 2011 23:53:43 -0700 (PDT) Subject: [R] handling constant factors in prediction using svm Message-ID: <1317711223804-3870093.post@n4.nabble.com> Hi users! I am fitting a model with several factor variables as independents using svm. since there are lots of categorical variables,the training and test data sets have been created using dummy.data.frame option from dummies package. I have a factor A in the training data set with 2 levels (0,1).In the test set, this factor A has only 1 level (1) and hence when applying dummy.data.frame, the variable gets dropped(and that's how i want it too). The problem comes when I am trying to predict the test data as an error is thrown saying A0 object is not found. Is there anyway to solve this problem? Thanks Divya -- View this message in context: http://r.789695.n4.nabble.com/handling-constant-factors-in-prediction-using-svm-tp3870093p3870093.html Sent from the R help mailing list archive at Nabble.com. From daniel at umd.edu Tue Oct 4 08:58:42 2011 From: daniel at umd.edu (Daniel Malter) Date: Mon, 3 Oct 2011 23:58:42 -0700 (PDT) Subject: [R] inconsistent behavior of summary function In-Reply-To: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> Message-ID: <1317711522584-3870106.post@n4.nabble.com> I have not read the manual, but I drew 10000 random normal vectors and 10000 random Poisson vectors of length 10000 and was unable to reproduce this behavior. Can you provide an example (self-contained code) that reproduces this problem? Thanks, Daniel Jeanne M. Spicer wrote: > > The summary function behaves inconsistently with data frame columns, e.g. > > summary(rock) #max of area 12212, correct > summary(rock$area) #max of area 12210, incorrect max > > I know that > summary(rock$area, digits=5) > will correct the error (I DID read the manual). But my point is the > inconsistency, because I get the correct answer without having to add the > digits option in the first statement when referring to the full dataframe. > This is one of the first functions that beginners use and if they have to > RTM and tinker with options before they can get a consistent value for the > max of an integer column, it is off-putting to say the least. At worst it > confirms the skeptic's suspicion that open-source software is a bit flaky. > Would it be out of line to report this to r-bugs -- at least to improve on > the documentation? > > -jms > r2.13.1 maclion > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/inconsistent-behavior-of-summary-function-tp3869906p3870106.html Sent from the R help mailing list archive at Nabble.com. From rainer.schuermann at gmx.net Tue Oct 4 09:40:22 2011 From: rainer.schuermann at gmx.net (Rainer Schuermann) Date: Tue, 04 Oct 2011 09:40:22 +0200 Subject: [R] Efficient way to do a merge in R In-Reply-To: References: Message-ID: <2860845.QYlvPtVFHK@augeatur> > Any comments are very welcome, So I give it a shot, although I don't have answers but only some ideas which avenues I would explore, not being an expert at all: 1. I would try to be more restrictive with the columns used for merge, trying something like m1 <- merge( x, y, by.x = "V1", by.y = "V1", all = TRUE ) 2. It may be an option to use match() directly: indices <- match( y$V1, x$V1 ) That should give you a vector of 300,000 indices mapping the y values to their corresponding x records. I assume that there is always one record in y matching one record in x. You would still need to write some code to add the corresponding y values to a new column in x. 3. If that fails, and nobody else has a better idea, I would consider using a database engine for the job. Again, no expert advice, just a few ideas! Rgds, Rainer On Tuesday 04 October 2011 01:01:45 Aur?lien PHILIPPOT wrote: > Dear all, > I am new in R and I have been faced with the following problem, that slows > me down a lot. I am short of ideas to circumvent it. So, any help would be > highly appreciated: > > I have 2 dataframes x and y. x is very big (70 million observations), > whereas y is smaller (300000 observations). > All the observations of y are present in x. But y has one additional > variable that I would like to incorporate to the dataframe x. > > For instance, imagine they have the following variable names: > colnames(x)<- c("V1", "V2", "V3", "V4") and colnames(y)<- c("V1", "V2", > "V5") > > -Since the observations of y are present in x, my strategy was to merge x > and y so that the dataframe x would get the values of the variable V5 for > the observations that are both in x and y. > > -So, I did the following: > dat<- merge(x, y, all=TRUE). > > On a small example, it works fine. The only problem is that when I apply it > to my big dataframe x, it really take for ever (several days and not done > yet) and I have a very fast computer. So, I don't know whether I should > stop now or keep on waiting. > > Does anyone have any idea to perform this operation in a more efficient way > (in terms of computation time)? > In addition, does anyone know how to incoporate some sort of counter in a > program to check what how much work has been done at a given point of time? > > Any comments are very welcome, > Thanks, > > Best, > Aurelien > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From rainer.schuermann at gmx.net Tue Oct 4 09:40:22 2011 From: rainer.schuermann at gmx.net (Rainer Schuermann) Date: Tue, 04 Oct 2011 09:40:22 +0200 Subject: [R] Efficient way to do a merge in R In-Reply-To: References: Message-ID: <2860845.QYlvPtVFHK@augeatur> > Any comments are very welcome, So I give it a shot, although I don't have answers but only some ideas which avenues I would explore, not being an expert at all: 1. I would try to be more restrictive with the columns used for merge, trying something like m1 <- merge( x, y, by.x = "V1", by.y = "V1", all = TRUE ) 2. It may be an option to use match() directly: indices <- match( y$V1, x$V1 ) That should give you a vector of 300,000 indices mapping the y values to their corresponding x records. I assume that there is always one record in y matching one record in x. You would still need to write some code to add the corresponding y values to a new column in x. 3. If that fails, and nobody else has a better idea, I would consider using a database engine for the job. Again, no expert advice, just a few ideas! Rgds, Rainer On Tuesday 04 October 2011 01:01:45 Aur?lien PHILIPPOT wrote: > Dear all, > I am new in R and I have been faced with the following problem, that slows > me down a lot. I am short of ideas to circumvent it. So, any help would be > highly appreciated: > > I have 2 dataframes x and y. x is very big (70 million observations), > whereas y is smaller (300000 observations). > All the observations of y are present in x. But y has one additional > variable that I would like to incorporate to the dataframe x. > > For instance, imagine they have the following variable names: > colnames(x)<- c("V1", "V2", "V3", "V4") and colnames(y)<- c("V1", "V2", > "V5") > > -Since the observations of y are present in x, my strategy was to merge x > and y so that the dataframe x would get the values of the variable V5 for > the observations that are both in x and y. > > -So, I did the following: > dat<- merge(x, y, all=TRUE). > > On a small example, it works fine. The only problem is that when I apply it > to my big dataframe x, it really take for ever (several days and not done > yet) and I have a very fast computer. So, I don't know whether I should > stop now or keep on waiting. > > Does anyone have any idea to perform this operation in a more efficient way > (in terms of computation time)? > In addition, does anyone know how to incoporate some sort of counter in a > program to check what how much work has been done at a given point of time? > > Any comments are very welcome, > Thanks, > > Best, > Aurelien > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From silverstein_yellowcard at yahoo.com Tue Oct 4 09:48:55 2011 From: silverstein_yellowcard at yahoo.com (Leynnard Rey Matillano) Date: Tue, 4 Oct 2011 00:48:55 -0700 (PDT) Subject: [R] shapefile kriging Message-ID: <1317714535.77597.YahooMailNeo@web110705.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From fernando.cabrera at nordea.com Tue Oct 4 09:53:22 2011 From: fernando.cabrera at nordea.com (fernando.cabrera at nordea.com) Date: Tue, 4 Oct 2011 09:53:22 +0200 Subject: [R] Matrix/Vector manipulation In-Reply-To: References: <9DE405308A6AA24AA794B76282C6C00F0A069BF445@HQ-POST1> Message-ID: Correction to my previous mail: my_cumsum(4,R,W) does not return 4.0, it returns 4.9! -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of fernando.cabrera at nordea.com Sent: 4. oktober 2011 08:37 To: r-help at r-project.org Subject: Re: [R] Matrix/Vector manipulation Stylish, but ifelse only includes a cumsum less or equal than v and ignores the remainder, if v does not fit equally in say the first two weight buckets. > R <- c(1.2, 1.3, 1.5) > W <- c(3,2,5) > my_cumsum(4, R, W) # should take 3*1.2 + 1*1.3 [1] 4.0 > sum(ifelse(cumsum(W) <= 4, W, 0) * R) # ignores the 1*1.3 part because 3+2 > 4 [1] 3.6 Cheers, Fer -----Original Message----- From: David Reiner [mailto:David.Reiner at xrtrading.com] Sent: 3. oktober 2011 17:57 To: Cabrera, Fernando ?lvarez; r-help at r-project.org Subject: RE: [R] Matrix/Vector manipulation sum(ifelse(cumsum(W)<=v, W, 0) * R) HTH, David L. Reiner -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of fernando.cabrera at nordea.com Sent: Monday, October 03, 2011 9:50 AM To: r-help at r-project.org Subject: [SPAM] - [R] Matrix/Vector manipulation - Bayesian Filter detected spam Hi guys, Have the following problem computing vectors with pure vector algebra and end up reverting to recursion or for-looping. Function my_cumsum calculates a weighted average (W) of ratios (R), but only up to the given size/volume (v). Now I recurse into the vector (from left to right) with what you have left from the difference of volume minus current weight, and stop when the difference is less than or equal to the current weight. Vectors W and R have the same length, and v is always a positive integer. W: {w_1 w_2 .. w_m} R: {r_1 r_2 .. r_m} my_cumsum <- function(v, R, W) { if (v <= W[1]) # check the head v*R[1] else W[1]*R[1] + my_cumsum(v - W[1], R[2:length(R)], W[2:length(W)]) # recurse the tail } Any help is greatly appreciated! Fernando Alvarez "Great ideas originate in the muscles." ~ Thomas A. Edison ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, "XR Content") are confidential and proprietary to XR Trading, LLC ("XR") and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From jwiley.psych at gmail.com Tue Oct 4 10:04:40 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 4 Oct 2011 01:04:40 -0700 Subject: [R] Efficient way to do a merge in R In-Reply-To: <2860845.QYlvPtVFHK@augeatur> References: <2860845.QYlvPtVFHK@augeatur> Message-ID: On Tue, Oct 4, 2011 at 12:40 AM, Rainer Schuermann wrote: >> Any comments are very welcome, > So I give it a shot, although I don't have answers but only some ideas which avenues I would explore, not being an > expert at all: > > 1. I would try to be more restrictive with the columns used for merge, trying something like > m1 <- merge( x, y, by.x = "V1", by.y = "V1", all = TRUE ) > > 2. It may be an option to use match() directly: > indices <- match( y$V1, x$V1 ) > That should give you a vector of 300,000 indices mapping the y values to their corresponding x records. I assume that > there is always one record in y matching one record in x. You would still need to write some code to add the > corresponding y values to a new column in x. I think this idea is a good one (though even match could be slow with 70 million observations). I believe related to the extraction and assignment methods for data frames, some extra copies of data end up being made (at least this is my understanding, experts may correct me), so I would consider possibly using a list (you lose the builtin data frame checking that all variables are of the same length (same number of rows), but I think it makes it faster to work with. If you know the indices in x where the y values should go and the class of y (say numeric) then: tmp <- vector("numeric", 70000000) tmp[indices] <- y$V5 x$V5 <- tmp rm(tmp) gc() and you're done. Takes less than a minute to run on my little laptop (8GB RAM, 1.6ghz dual core, only slightly faster than a netbook). > > 3. If that fails, and nobody else has a better idea, I would consider using a database engine for the job. Not a bad idea for working with large datasets either. > > Again, no expert advice, just a few ideas! > > Rgds, > Rainer > > > On Tuesday 04 October 2011 01:01:45 Aur?lien PHILIPPOT wrote: >> Dear all, >> I am new in R and I have been faced with the following problem, that slows >> me down a lot. ?I am short of ideas to circumvent it. So, any help would be >> highly appreciated: >> >> I have 2 dataframes x and y. ?x is very big (70 million observations), >> whereas y is smaller (300000 observations). >> All the observations of y are present in x. But y has one additional >> variable that I would like to incorporate to the dataframe x. >> >> For instance, imagine they have the following variable names: >> colnames(x)<- c("V1", "V2", "V3", "V4") and colnames(y)<- c("V1", "V2", >> "V5") >> >> -Since the observations of y are present in x, my strategy was to merge x >> and y so that the dataframe x would get the values of the variable V5 for >> the observations that are both in x and y. >> >> -So, I did the following: >> dat<- merge(x, y, all=TRUE). >> >> On a small example, it works fine. The only problem is that when I apply it >> to my big dataframe x, it really take for ever (several days and not done >> yet) and I have a very ?fast computer. So, I don't know whether I should >> stop now or keep on waiting. >> >> Does anyone have any idea to perform this operation in a more efficient way >> (in terms of computation time)? >> In addition, does anyone know how to incoporate some sort of counter in a >> program to check what how much work has been done at a given point of time? >> >> Any comments are very welcome, >> Thanks, >> >> Best, >> Aurelien >> >> ? ? ? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From rolf.turner at xtra.co.nz Tue Oct 4 10:19:11 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Tue, 04 Oct 2011 21:19:11 +1300 Subject: [R] inconsistent behavior of summary function In-Reply-To: <1317711522584-3870106.post@n4.nabble.com> References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> <1317711522584-3870106.post@n4.nabble.com> Message-ID: <4E8AC17F.2020200@xtra.co.nz> On 04/10/11 19:58, Daniel Malter wrote: > I have not read the manual, but I drew 10000 random normal vectors and 10000 > random Poisson vectors of length 10000 and was unable to reproduce this > behavior. Can you provide an example (self-contained code) that reproduces > this problem? The OP *did* provide a reproducible example. The "rock" data are a built-in data set. See ?rock. Also the OP is correct! cheers, Rolf Turner From mdowle at mdowle.plus.com Tue Oct 4 10:48:09 2011 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 4 Oct 2011 09:48:09 +0100 Subject: [R] Efficient way to do a merge in R References: <2860845.QYlvPtVFHK@augeatur> Message-ID: "Joshua Wiley" wrote in message news:CANz9Z_KopuwkzB-zxr96PVuLhHf2ZNxNtxSO9xnyhO-_JUMkcQ at mail.gmail.com... > On Tue, Oct 4, 2011 at 12:40 AM, Rainer Schuermann > wrote: >>> Any comments are very welcome, >> >> 3. If that fails, and nobody else has a better idea, I would consider >> using a database engine for the job. > > Not a bad idea for working with large datasets either. > or, the data.table package http://datatable.r-forge.r-project.org/ Matthew From mdowle at mdowle.plus.com Tue Oct 4 11:08:39 2011 From: mdowle at mdowle.plus.com (Matthew Dowle) Date: Tue, 4 Oct 2011 10:08:39 +0100 Subject: [R] cannot install.packages("data.table") References: Message-ID: Assuming you can install other packages ok, data.table depends on R >=2.12.0. Which version of R do you have? _If_ that's the problem, does anyone know if anything prevents R's error message from stating which dependency isn't satisfied? I think I've seen users confused by this before, for other packages too. Matthew "Emmanuel Mayssat" wrote in message news:CACB6ZmCTdRJKBfTQrW+tv2owPtRkGwYTc_-HVVtGUZwu9Gq7TA at mail.gmail.com... Hello, I am new at R. I am trying to see if R can work for me. I need to do database like lookup (select * from table where name=='toto') and work with matrix (transpose, add columns, remove rows, etc). It seems that the data.table package can help. http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table I installed R and ... > install.packages("data.table") Warning in install.packages("data.table") : argument 'lib' is missing: using '/usr/local/lib/R/site-library' Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ?data.table? is not available > install.packages() doesn't show the package. where can I find it? -- Emmanuel From paul.hiemstra at knmi.nl Tue Oct 4 11:19:22 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Tue, 04 Oct 2011 09:19:22 +0000 Subject: [R] shapefile kriging In-Reply-To: <1317714535.77597.YahooMailNeo@web110705.mail.gq1.yahoo.com> References: <1317714535.77597.YahooMailNeo@web110705.mail.gq1.yahoo.com> Message-ID: <4E8ACF9A.7080503@knmi.nl> On 10/04/2011 07:48 AM, Leynnard Rey Matillano wrote: > I'm new to R and I'm working on point shapefiles. Is there a way that you could interpolate a shapefile via kriging in R using an attribute? All examples on the internet are using txt files and CSVs. Thanks a lot. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hi, Kriging is never done on a txt file or a csv file, but on an R object. For gstat this is a SpatialPointsDataFrame and for geoR this is another type of object. CVS files, txt files (what is the difference?) and shapefiles can all be read into SpatialPointsDataFrame's. For reading shapefiles, see the rgdal package. Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From paul.hiemstra at knmi.nl Tue Oct 4 11:20:31 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Tue, 04 Oct 2011 09:20:31 +0000 Subject: [R] how do i put two scatterplots on same graph In-Reply-To: <1317709183813-3870030.post@n4.nabble.com> References: <1317709183813-3870030.post@n4.nabble.com> Message-ID: <4E8ACFDF.9070108@knmi.nl> On 10/04/2011 06:19 AM, jricci wrote: > Have two sets of scatterplot data > hypothetically > a) stem lenght vs number of petals in red flowers > b) stem lenght vs number of petals in white flowers > > want to place on same scatter plot with same x,y axis but different collored > markers > > How do I do this in R > > -- > View this message in context: http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hi, You could take a look at the ggplot2 package. good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From cjpauw at gmail.com Tue Oct 4 11:28:22 2011 From: cjpauw at gmail.com (christiaan pauw) Date: Tue, 4 Oct 2011 11:28:22 +0200 Subject: [R] matrix of chi-square results for all combinations of data frame Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From S.Ellison at LGCGroup.com Tue Oct 4 13:04:37 2011 From: S.Ellison at LGCGroup.com (S Ellison) Date: Tue, 4 Oct 2011 12:04:37 +0100 Subject: [R] The use of period in function names and variable names In-Reply-To: References: Message-ID: See para 10.3.2 'Identifiers' in the R language definition (always distributed with R in the html help system), or ?make.names, for a concise statement of what constitutes a valid variable name in R. It's actually underscores that might give trouble with older versions, not '.'. But they'd have to be a lot older by R standards (pre 1.9.0). I am not sure why there has been a recent shift away from periods and towards camelCase in some R packages; personally I find a period or underscore much more useful for making a variable name readable. And a mix of camelCase and period.breaks makes it a lot harder to guess which case-sensitive string to use. The number of different combinations of case and period I end up trying for R.Version (occasionally used, never quite often enought to be automatic) defies belief ;-). S Ellison > From: r-help-bounces at r-project.org On Behalf Of Smart Guy > Sent: 04 October 2011 05:20 > To: r-help at r-project.org > Subject: [R] The use of period in function names and variable names > > Hi, > I am looking for some guidance on whether I can use the > period(.) in function names and variable names. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} From bbolker at gmail.com Tue Oct 4 14:14:33 2011 From: bbolker at gmail.com (Ben Bolker) Date: Tue, 4 Oct 2011 12:14:33 +0000 Subject: [R] Quasi-Binomial simulation References: <1317684059.72627.YahooMailNeo@web38308.mail.mud.yahoo.com> Message-ID: saber fallahpour yahoo.com> writes: > > Hi > I want to do simulation on quasi-binomial distribution with some covariates. > Does anyone have an idea how to do that? > There is no such thing as a quasi-binomial distribution, but if you parameterized the beta-binomial distribution appropriately I think it would be straightforward to generate discrete data with a specified maximum value, a mean that was specified by an inverse-link function and a design matrix applied to the covariates, and had a variance proportional (but not equal) to n*p*(1-p). For comparison, you might want to look up the "negative binomial type I" as defined by Hardin and Hilde, which is "quasi-Poisson" in the same sense. See ?model.matrix ?plogis ?dbetabinom in the emdbook package (and probably elsewhere: install the sos package and try findFn("beta-binomial") From vincy_pyne at yahoo.ca Tue Oct 4 14:26:12 2011 From: vincy_pyne at yahoo.ca (Vincy Pyne) Date: Tue, 4 Oct 2011 05:26:12 -0700 (PDT) Subject: [R] Matching two datasets and updating values Message-ID: <1317731172.83963.YahooMailClassic@web120309.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From petr.pikal at precheza.cz Tue Oct 4 14:37:34 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Tue, 4 Oct 2011 14:37:34 +0200 Subject: [R] Matching two datasets and updating values In-Reply-To: <1317731172.83963.YahooMailClassic@web120309.mail.ne1.yahoo.com> References: <1317731172.83963.YahooMailClassic@web120309.mail.ne1.yahoo.com> Message-ID: Hi > > Dear R forum > > I have two datafarmes with category and cat_val forming one dataframe and > cust and cust_category forming another dataframe. > > category = c("C", "D", "B", "A") > cat_val = c(0.10, 0.25, 0.40, 0.54) > cust = c("cust_1", "cust_2", "cust_3", "cust_4", "cust_5", "cust_6", > "cust_7", "cust_8", "cust_9", "cust_10") > cust_category = c("C", "A", "A", "A", "A", "C", "D", "B", "B", "D") > > Thus, I have > > > category > [1] "C" "D" "B" "A" > > > cat_val > [1] 0.10 0.25 0.40 0.54 > > > cust > [1] "cust_1" "cust_2" "cust_3" "cust_4" "cust_5" > [6] "cust_6" "cust_7" "cust_8" "cust_9" "cust_10" > > > cust_category > [1] "C" "A" "A" "A" "A" "C" "D" "B" "B" "D" > > My problem is to match 'cust_category' with 'category' and accordingly > selct the value assigned to this category value. In other words, 1st > element of cust_category is "C", so it should select the value 0.10, the > second element is "A", so it should assign value 0.54 against this. So > effectively I should get What about merge? a<-data.frame(category, cat_val) b<-data.frame(cust, cust_category) merge(a,b, by.x="category", by.y="cust_category") category cat_val cust 1 A 0.54 cust_3 2 A 0.54 cust_4 3 A 0.54 cust_5 4 A 0.54 cust_2 5 B 0.40 cust_8 6 B 0.40 cust_9 7 C 0.10 cust_1 8 C 0.10 cust_6 9 D 0.25 cust_7 10 D 0.25 cust_10 Regards Petr > > cust cust_category cat_val > cust_1 C 0.10 > cust_2 A 0.54 > cust_3 A 0.54 > ............................................ > cust_10 D 0.25 > > > Kindly guide > > Regards > > Vincy > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jholtman at gmail.com Tue Oct 4 14:43:34 2011 From: jholtman at gmail.com (jim holtman) Date: Tue, 4 Oct 2011 08:43:34 -0400 Subject: [R] Parsing variable-length delimited strings into a matrix In-Reply-To: <0D642FB35639AA4996C2E5782E10FBC601162E@exchange01.well.ox.ac.uk> References: <0D642FB35639AA4996C2E5782E10FBC601162E@exchange01.well.ox.ac.uk> Message-ID: Will this do it for you: > x <- readLines(textConnection("A,B,C + B,B + A,AA,C + A,B,BB,BBB,B,B")) > closeAllConnections() > x.s <- strsplit(x, ',') > # determine max length > x.max <- max(sapply(x.s, length)) > # create character matrix > x.mat <- matrix( + sapply(x.s, function(a) c(a, rep(NA, x.max - length(a)))) + , byrow = TRUE + , ncol = x.max + ) > > > x.mat [,1] [,2] [,3] [,4] [,5] [,6] [1,] "A" "B" "C" NA NA NA [2,] "B" "B" NA NA NA NA [3,] "A" "AA" "C" NA NA NA [4,] "A" "B" "BB" "BBB" "B" "B" > On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright wrote: > > I'm struggling to find a way of parsing a vector of data in this sort of form: > > A,B,C > B,B > A,AA,C > A,B,BB,BBB,B,B > > into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes. > > Is there a reasonably quick method of achieving this in a vector-oriented way? > > Cheers, > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From mentor_ at gmx.net Tue Oct 4 14:43:55 2011 From: mentor_ at gmx.net (syrvn) Date: Tue, 4 Oct 2011 05:43:55 -0700 (PDT) Subject: [R] texi2dvi problem when compiling incorrect Latex code Message-ID: <1317732235196-3870827.post@n4.nabble.com> Hello, I am working on a big R project using Eclipse/StatET/Texlipse. I'd like to write a Latex document within that project but DO NOT want to Sweave it. It's pure Latex. Via the external tools configurations I set up 2 different versions to ensure that my latex document is processed correctly. Version 1 (System Call): library(tools) setwd("${container_loc}") file = "${resource_loc:${source_file_path}}" try(system(paste("texi2pdf", shQuote(file)), intern=TRUE)) Version 2 (R Call): library(tools) setwd("${container_loc}") texi2dvi(file = "${resource_loc:${source_file_path}}", pdf = TRUE, quiet = FALSE) Both versions work well as long as there is no error in my latex code. As soon as there is an error the process of texi2pdf / texi2dvi is not finished as the programme waits for user input (mostly just press "enter" key). The problem is that R outputs the output only after the whole programme finished so I always end up having to kill my R console. Is there any workaround for that? Syrvn -- View this message in context: http://r.789695.n4.nabble.com/texi2dvi-problem-when-compiling-incorrect-Latex-code-tp3870827p3870827.html Sent from the R help mailing list archive at Nabble.com. From alexandrovich at mathematik.uni-marburg.de Tue Oct 4 14:04:02 2011 From: alexandrovich at mathematik.uni-marburg.de (Grigory Alexandrovich) Date: Tue, 04 Oct 2011 14:04:02 +0200 Subject: [R] Problem with .C In-Reply-To: <4D9AE45D.70205@mathematik.uni-marburg.de> References: <4D9AE45D.70205@mathematik.uni-marburg.de> Message-ID: <4E8AF632.1030803@mathematik.uni-marburg.de> Hello, I wrote a function in C, which works fine if called from the main-function in C. But as soon as I try to call this function from R like .C('foo', as.double(x), as.integer(y)), the programm crashes. I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and loaded it into R with dyn.load(). What can be the cause of such behaviour? Again, the C-funcion itself works, but not if called from R. Thanks Grigory Alexandrovich From Ashley.Houlden at manchester.ac.uk Tue Oct 4 10:45:15 2011 From: Ashley.Houlden at manchester.ac.uk (Ashley Houlden) Date: Tue, 4 Oct 2011 08:45:15 +0000 Subject: [R] Adonis and nmds help and questions for a novice. Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From benzerfa at gmx.ch Tue Oct 4 11:50:14 2011 From: benzerfa at gmx.ch (Samir Benzerfa) Date: Tue, 4 Oct 2011 11:50:14 +0200 Subject: [R] creating subsets and calculating weights Message-ID: <000001cc827b$050650b0$0f12f210$@ch> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From deena.fakhry at gmail.com Tue Oct 4 11:10:13 2011 From: deena.fakhry at gmail.com (D.Emad) Date: Tue, 4 Oct 2011 02:10:13 -0700 (PDT) Subject: [R] Giant font on the R plots... Message-ID: <1317719413281-3870335.post@n4.nabble.com> Hello, I've been facing a really stupid problem... When I try to plot using heatplot or hclust or any similar function, the labels of the x-axis - which are the samples names - are giant & overlapping. I can't even read the samples names! I tried cex.lab = 0.5, it helped only with the y axis and not the x-axis... Any help please?! -- View this message in context: http://r.789695.n4.nabble.com/Giant-font-on-the-R-plots-tp3870335p3870335.html Sent from the R help mailing list archive at Nabble.com. From glorykwok at hotmail.com Tue Oct 4 10:00:11 2011 From: glorykwok at hotmail.com (pigpigmeow) Date: Tue, 4 Oct 2011 01:00:11 -0700 (PDT) Subject: [R] About stepwise regression problem Message-ID: <1317715211875-3870217.post@n4.nabble.com> First of all, I have GAMs noxd<-gam(newNOX~pressure+maxtemp+s(avetemp,bs="cr")+s(mintemp,bs="cr")+s(RH,bs="cr")+s(solar,bs="cr")+s(windspeed,bs="cr")+s(transport,bs="cr"),family=gaussian (link=log),groupD,methods=REML) Then I type " summary(noxd)". and show Family: gaussian Link function: log Formula: newNO2 ~ pressure + s(maxtemp, bs = "cr") + s(avetemp, bs = "cr") + s(mintemp, bs = "cr") + RH + s(solar, bs = "cr") + s(windspeed, bs = "cr") + s(transport, bs = "cr") Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.721513 0.049108 55.419 <2e-16 *** pressure 0.028988 0.019434 1.492 0.140 RH 0.005228 0.009763 0.535 0.594 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Approximate significance of smooth terms: edf Ref.df F p-value s(maxtemp) 6.346 7.276 1.223 0.29991 s(avetemp) 1.000 1.000 0.226 0.63562 s(mintemp) 1.908 2.396 1.066 0.35871 s(solar) 3.797 4.490 2.164 0.07359 . s(windspeed) 5.305 6.341 2.346 0.03648 * s(transport) 7.234 7.984 2.807 0.00884 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 R-sq.(adj) = 0.307 Deviance explained = 49.1% GCV score = 61.136 Scale est. = 44.49 n = 105 *I eliminate the greatest of p-value, that is s(avetemp) term then type "summary(no2d)" and show * Family: gaussian Link function: log Formula: newNO2 ~ pressure + s(maxtemp, bs = "cr") + s(mintemp, bs = "cr") + RH + s(solar, bs = "cr") + s(windspeed, bs = "cr") + s(transport, bs = "cr") Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.720973 0.048834 55.719 <2e-16 *** pressure 0.031346 0.019040 1.646 0.104 RH 0.006165 0.009583 0.643 0.522 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Approximate significance of smooth terms: edf Ref.df F p-value s(maxtemp) 6.499 7.425 1.450 0.1942 s(mintemp) 1.975 2.487 1.788 0.1655 s(solar) 3.925 4.628 2.118 0.0770 . s(windspeed) 5.373 6.417 2.967 0.0101 * s(transport) 7.043 7.822 2.785 0.0097 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 R-sq.(adj) = 0.316 Deviance explained = 49.2% GCV score = 59.746 Scale est. = 43.919 n = 105 > *I eliminate the greatest of p-value, that is RH term then type "summary(no2d)" and show * Family: gaussian Link function: log Formula: newNO2 ~ pressure + s(maxtemp, bs = "cr") + s(mintemp, bs = "cr") + s(solar, bs = "cr") + s(windspeed, bs = "cr") + s(transport, bs = "cr") Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.72001 0.04859 55.974 <2e-16 *** pressure 0.02978 0.01878 1.586 0.117 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Approximate significance of smooth terms: edf Ref.df F p-value s(maxtemp) 6.544 7.468 1.654 0.12830 s(mintemp) 1.952 2.460 1.697 0.18301 s(solar) 3.977 4.686 2.869 0.02211 * s(windspeed) 5.381 6.425 2.641 0.01953 * s(transport) 7.052 7.830 3.348 0.00257 ** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 R-sq.(adj) = 0.321 Deviance explained = 49% GCV score = 58.61 Scale est. = 43.591 n = 105 I remove s(mintemp) term... until Family: gaussian Link function: log Formula: newNO2 ~ s(windspeed, bs = "cr") Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.78159 0.04701 59.16 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Approximate significance of smooth terms: edf Ref.df F p-value s(windspeed) 1.775 2.251 4.54 0.0101 * --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 R-sq.(adj) = 0.1 Deviance explained = 11.5% GCV score = 59.348 Scale est. = 57.78 n = 105 I remain s(windspeed) term finally.my significant level = 0.05.... I have a question... First, Does the backward elimation perform correctly? Second, Is it possible run the process( backward elimation) automatically? Third, I found the the linear part was listed "Pr(>|t|)" and the smoothing part " p-value". these two terms are the same meaning? -- View this message in context: http://r.789695.n4.nabble.com/About-stepwise-regression-problem-tp3870217p3870217.html Sent from the R help mailing list archive at Nabble.com. From markm0705 at gmail.com Tue Oct 4 14:26:20 2011 From: markm0705 at gmail.com (markm0705) Date: Tue, 4 Oct 2011 05:26:20 -0700 (PDT) Subject: [R] Plotting a polygon with xyplot Message-ID: <1317731180152-3870788.post@n4.nabble.com> Dear R helpers I would like to plot a string of points as a polygon in xyplot. I'm a bit lost as to how to get the points plotting in the correct order. I would also like some hints on how to render or fill the polygon. Scrpt below and data file attached Thanks Markm library("lattice") # set size of the window windows(height=7, width=10,rescale=c("fixed")) Data_poly<- read.table("111004_Lode_Outlines.csv",header = TRUE,sep = ",",) xyplot(z~y, data=Data_poly, type="l" ) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv 111004_Lode_Outlines.csv -- View this message in context: http://r.789695.n4.nabble.com/Plotting-a-polygon-with-xyplot-tp3870788p3870788.html Sent from the R help mailing list archive at Nabble.com. From pedabreu at gmail.com Tue Oct 4 12:40:20 2011 From: pedabreu at gmail.com (pedabreu) Date: Tue, 4 Oct 2011 03:40:20 -0700 (PDT) Subject: [R] package.skeleton generates ".env = " Message-ID: <1317724820500-3870577.post@n4.nabble.com> Hello, i trying to create a package using package.skeleton. I use R.oo package to create oriented-object classes. When i use package.skeleton, this creates the following file: classA <- structure(function() { extend(Object(),"Class A", .var1= NULL) } , .env = , class = c("Class", "Object"), formals = c("public", "class"), modifiers = c("public", "class")) Then i compile using R CMD build myPkg. when i try to install.package and give this error: " /tmp/RtmpaOZ7IQ/R.INSTALL412da433/JSSbase/R/GTHeuristic.R:7:10: unexpected '<' 6: } 7: , .env = <" why the package.skeleton creates ".env = "?? Thank you -- View this message in context: http://r.789695.n4.nabble.com/package-skeleton-generates-env-environment-tp3870577p3870577.html Sent from the R help mailing list archive at Nabble.com. From peter_minting at hotmail.com Tue Oct 4 13:30:05 2011 From: peter_minting at hotmail.com (Peter Minting) Date: Tue, 4 Oct 2011 11:30:05 +0000 Subject: [R] Rug plot curve reversal Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From m.ali.abbas at gmail.com Tue Oct 4 14:52:48 2011 From: m.ali.abbas at gmail.com (Ali.Abbas) Date: Tue, 4 Oct 2011 05:52:48 -0700 (PDT) Subject: [R] Correlation based on the attributes of vertices Message-ID: <1317732768546-3870844.post@n4.nabble.com> Dear all, I have a directed graph - an Igraph to be more precise - which has some vertices attributes (like dorm, year etc). Edges and the graph itself do not have any attributes. Based on the attributes of the vertices, I'd like to calculate correlation among the edges (e.g. how likely people of the same dorm are connected?) for the whole graph. Also, I'd like to calculate inter-attributes correlation for the whole graph (how correlated dorm and year attributes are?) Could you kindly tell how to go about it? I thought of populating a list just like the graph edge list, and then replacing each source and destination by its attribute value. For instance, instead of the edge (0->1), I will replace it by (dorm_valueOf(0) -> dorm_valueOf(1)) and then run the function /cor/ over it. It does not seem like a nice solution. On a side note, how does one get source and destination out of an edge list by the Edge Iterator? For example, I'd like to know the source and the target vertices' index of the first edge, E(graph)[0]. How can I extract this information? Thanks! Best Regards, Ali P.S. I have already posted the same question on the igraph mailing list. On receiving no response from there, I am posting it over here. -- View this message in context: http://r.789695.n4.nabble.com/Correlation-based-on-the-attributes-of-vertices-tp3870844p3870844.html Sent from the R help mailing list archive at Nabble.com. From francy.casalino at gmail.com Tue Oct 4 11:29:54 2011 From: francy.casalino at gmail.com (francesca casalino) Date: Tue, 4 Oct 2011 10:29:54 +0100 Subject: [R] Import in R with White Spaces In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From francy.casalino at gmail.com Tue Oct 4 11:34:29 2011 From: francy.casalino at gmail.com (francesca casalino) Date: Tue, 4 Oct 2011 10:34:29 +0100 Subject: [R] Merge two data frames and find common values and non-matching values In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From francy.casalino at gmail.com Tue Oct 4 13:35:30 2011 From: francy.casalino at gmail.com (francesca casalino) Date: Tue, 4 Oct 2011 12:35:30 +0100 Subject: [R] Merge two data frames and find common values and non-matching values In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gunter.berton at gene.com Tue Oct 4 15:42:39 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 4 Oct 2011 06:42:39 -0700 Subject: [R] inconsistent behavior of summary function In-Reply-To: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ken.knoblauch at inserm.fr Tue Oct 4 16:14:52 2011 From: ken.knoblauch at inserm.fr (Ken Knoblauch) Date: Tue, 4 Oct 2011 14:14:52 +0000 Subject: [R] Plotting a polygon with xyplot References: <1317731180152-3870788.post@n4.nabble.com> Message-ID: markm0705 gmail.com> writes: > I would like to plot a string of points as a polygon in xyplot. I'm a bit > lost as to how to get the points plotting in the correct order. I would > also like some hints on how to render or fill the polygon. > > Scrpt below and data file attached > > Thanks > > Markm > > library("lattice") > > # set size of the window > windows(height=7, width=10,rescale=c("fixed")) > > Data_poly<- read.table("111004_Lode_Outlines.csv",header = TRUE,sep = ",",) > > xyplot(z~y, > data=Data_poly, > type="l" > ) http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv > 111004_Lode_Outlines.csv > Before you try this with lattice, you might spend some time getting your abscissa values in an order that will plot the contour in a sequential fashion. It's not obvious how to do this a priori. Here is a simple-minded attempt after looking at your graphic, just using base graphics. Maybe, it will be sufficient for you to tweak it a bit further for what you want. Data_poly<- read.table("http://r.789695.n4.nabble.com/file/n3870788/111004_Lode_Outlines.csv", header = TRUE,sep = ",",) par(mfrow = c(1, 2), pty = "s") plot(z ~ y, Data_poly, type = "l") fh <- with(Data_poly, which(z > 240)) D_poly <- rbind(Data_poly[fh, ], Data_poly[-rev(fh), ]) D_poly <- rbind(D_poly, Data_poly[1, ]) plot(z ~ y, D_poly, type = "n") with(D_poly, polygon(y, z, col = "lightblue")) -- Ken Knoblauch Inserm U846 Stem-cell and Brain Research Institute Department of Integrative Neurosciences 18 avenue du Doyen L?pine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.sbri.fr/members/kenneth-knoblauch.html From murdoch.duncan at gmail.com Tue Oct 4 16:21:10 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 04 Oct 2011 10:21:10 -0400 Subject: [R] package.skeleton generates ".env = " In-Reply-To: <1317724820500-3870577.post@n4.nabble.com> References: <1317724820500-3870577.post@n4.nabble.com> Message-ID: <4E8B1656.5020209@gmail.com> On 04/10/2011 6:40 AM, pedabreu wrote: > Hello, > > i trying to create a package using package.skeleton. I use R.oo package to > create oriented-object classes. When i use package.skeleton, this creates > the following file: > > classA<- > structure(function() > { > > extend(Object(),"Class A", > .var1= NULL) > > > } > , .env =, class = c("Class", "Object"), formals = c("public", > "class"), modifiers = c("public", "class")) > > Then i compile using R CMD build myPkg. > > when i try to install.package and give this error: > > " /tmp/RtmpaOZ7IQ/R.INSTALL412da433/JSSbase/R/GTHeuristic.R:7:10: > unexpected '<' > 6: } > 7: , .env =<" > > why the package.skeleton creates ".env ="?? package.skeleton tries to deparse your code, but in some cases, that can't be done. As ?deparse says, "However, not all objects are deparse-able even with this option and a warning will be issued if the function recognizes that it is being asked to do the impossible." What you need to do is to copy your original source code that created classA into the package source. Presumably it uses some functions from R.oo to construct the object properly. Duncan Murdoch From hadley at rice.edu Tue Oct 4 16:23:23 2011 From: hadley at rice.edu (Hadley Wickham) Date: Tue, 4 Oct 2011 09:23:23 -0500 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: <4E89EFE3.5050804@noaa.gov> References: <4E89EFE3.5050804@noaa.gov> Message-ID: On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams wrote: > ?I'm interested in creating a graphic -like- this: > > c <- ggplot(mtcars, aes(qsec, wt)) > c + geom_point() + stat_smooth(fill="blue", colour="darkblue", size=2, alpha > = 0.2) > > but I need to show 2 sets of bands (with different shading) using 5%, 25%, > 75%, 95% limits that I specify and where the heavy blue line is the median. > I don't understand how to do this with ggplot2. Exactly what sort of limits do you want? It sounds like maybe you are looking for smoothed quantile regression. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ From gunter.berton at gene.com Tue Oct 4 16:40:48 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 4 Oct 2011 07:40:48 -0700 Subject: [R] Plotting a polygon with xyplot In-Reply-To: References: <1317731180152-3870788.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.jahn at ufz.de Tue Oct 4 16:11:14 2011 From: michael.jahn at ufz.de (Michael Jahn) Date: Tue, 04 Oct 2011 16:11:14 +0200 Subject: [R] Adding multiple gates/filters in densityplot In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pat2 at hi.is Tue Oct 4 16:34:12 2011 From: pat2 at hi.is (Panagiotis) Date: Tue, 4 Oct 2011 07:34:12 -0700 (PDT) Subject: [R] Question about linear mixed effects model (nlme) Message-ID: <1317738852033-3871203.post@n4.nabble.com> Hi, I applied a linear mixed effect model in my data using the nlme package. lme2<-lme(distance~temperature*condition, random=~+1|trial, data) and then anova. I want to ask if it is posible to get the least squares means for the interaction effect and the corresponding 95%ci. And then plot this values. Thank you Panagiotis -- View this message in context: http://r.789695.n4.nabble.com/Question-about-linear-mixed-effects-model-nlme-tp3871203p3871203.html Sent from the R help mailing list archive at Nabble.com. From Torsten.Hothorn at r-project.org Mon Oct 3 18:00:29 2011 From: Torsten.Hothorn at r-project.org (Torsten Hothorn) Date: Mon, 3 Oct 2011 18:00:29 +0200 Subject: [R] [R-pkgs] `partykit': A Toolkit for Recursive Partytioning Message-ID: New package `partykit': A Toolkit for Recursive Partytioning The purpose of the package is to provide a toolkit with infrastructure for representing, summarizing, and visualizing tree-structured regression and classification models. Thus, the focus is not on _inferring_ such a tree structure from data but to _represent_ a given tree so that printing/plotting and computing predictions can be performed in a standardized way. In particular, this unified infrastructure can be used for reading/coercing tree models from different sources (packages `rpart', `RWeka', `PMML') yielding objects that share functionality for `print()', `plot()', and `predict()' methods. The impatient users will hopefully have fun with install.packages("partykit") library("partykit") library("rpart") ### from ?rpart fit <- rpart(Kyphosis ~ Age + Number + Start, data = kyphosis) plot(as.party(fit)) Best, Torsten & Achim _______________________________________________ R-packages mailing list R-packages at r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages From hadley at rice.edu Tue Oct 4 17:10:56 2011 From: hadley at rice.edu (Hadley Wickham) Date: Tue, 4 Oct 2011 10:10:56 -0500 Subject: [R] ggplot2: expression() in legend labels? In-Reply-To: <20110924084903.GA24599@CasperVector> References: <20110924084903.GA24599@CasperVector> Message-ID: You need to set the labels... Hadley On Sat, Sep 24, 2011 at 3:49 AM, Casper Ti. Vector wrote: > Is there any way to use expression() in legend labels with ggplot2? > > It seems that things like >> scale_shape_manual(value = c( >> ? x = expression(italic(x)), >> ? y = expression(italic(y)) >> )) > don't work. > > Thanks very much :) > > -- > ? ?Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134, > valid from 2010 to 2013) from a key server. > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ From gunter.berton at gene.com Tue Oct 4 17:14:58 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 4 Oct 2011 08:14:58 -0700 Subject: [R] Question about linear mixed effects model (nlme) In-Reply-To: <1317738852033-3871203.post@n4.nabble.com> References: <1317738852033-3871203.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From lists at revelle.net Tue Oct 4 17:16:24 2011 From: lists at revelle.net (William Revelle) Date: Tue, 4 Oct 2011 10:16:24 -0500 Subject: [R] how do i put two scatterplots on same graph In-Reply-To: <4E8ACFDF.9070108@knmi.nl> References: <1317709183813-3870030.post@n4.nabble.com> <4E8ACFDF.9070108@knmi.nl> Message-ID: <38FDA808-D79C-45DA-8F65-EB0651CB8663@revelle.net> If the data are from one data.frame (e.g., the iris data set), then simply label the red and white flowers with different colors: e.g., with the iris data set plot(iris$Sepal.Length,iris$Sepal.Width,col=c("red","blue","black")[iris$Species],pch=c(16:18)[iris$Species]) Bill On Oct 4, 2011, at 4:20 AM, Paul Hiemstra wrote: > On 10/04/2011 06:19 AM, jricci wrote: >> Have two sets of scatterplot data >> hypothetically >> a) stem lenght vs number of petals in red flowers >> b) stem lenght vs number of petals in white flowers >> >> want to place on same scatter plot with same x,y axis but different collored >> markers >> >> How do I do this in R >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/how-do-i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > Hi, > > You could take a look at the ggplot2 package. > > good luck, > Paul > > -- > Paul Hiemstra, Ph.D. > Global Climate Division > Royal Netherlands Meteorological Institute (KNMI) > Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 > P.O. Box 201 | 3730 AE | De Bilt > tel: +31 30 2206 494 > > http://intamap.geo.uu.nl/~paul > http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Use R for psychology http://personality-project.org/r It is 6 minutes to midnight http://www.thebulletin.org From gunter.berton at gene.com Tue Oct 4 17:11:38 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 4 Oct 2011 08:11:38 -0700 Subject: [R] inconsistent behavior of summary function In-Reply-To: References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> <1317711522584-3870106.post@n4.nabble.com> <4E8AC17F.2020200@xtra.co.nz> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Tue Oct 4 17:41:47 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 17:41:47 +0200 Subject: [R] file input with readLines In-Reply-To: <39B5ED61E7BFC24FA8277B6DE92A9A3F0415418D@fkimlki01.enterprise.afmc.ds.af.mil> References: <39B5ED61E7BFC24FA8277B6DE92A9A3F0415418D@fkimlki01.enterprise.afmc.ds.af.mil> Message-ID: <4E8B293B.4070402@statistik.tu-dortmund.de> On 03.10.2011 19:19, Cable, Sam B Civ USAF AFMC AFRL/RVBXI wrote: > I am using readLines to read a fairly large ASCII file. readLines reads > a fixed number of lines, then other R code processes the data, then > readLines reads the same number of lines again, then other R code > processes the data, then .... > > > > Sort of like: > > > > conn<-file('filename','r') > > for (chunk in 1:100000) { > > Lines<-readLines(conn,n=25) > > # process "Lines" > > } > > > > The code is working, but I notice that it slows down greatly as time > progresses. It took 2 seconds to read my first chunk of data, 4 seconds > to read the next chunk, 10 after that. The quasi-exponential trend has > slowed, thank goodness, but after about a hundred reads, the read time > for the next chunk is over a minute. Let me stress that the number of > lines read in each chunk of data is absolutely fixed. > > > > The only processing I am doing at the point is to parse the new data, > and rbind the results to an existing data frame. And that's may be the interesting point. Have you tried to allocate the whole data.frame and assign into it later? It is probbaly not readLines() slowing you down. A minute seems to be quite a lot for resonable sized data. How many columns are we talking about?. Uwe Ligges > Processing of new data > in no way depends on earlier data. > > > > So, my question is why is the reading taking longer as time goes on? Is > there a way to fix this? Is there a better method than readLines? > > > > Thanks. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From grazia at stat.columbia.edu Tue Oct 4 17:44:31 2011 From: grazia at stat.columbia.edu (grazia at stat.columbia.edu) Date: Tue, 4 Oct 2011 11:44:31 -0400 (EDT) Subject: [R] adding a dummy variable... Message-ID: <1134.151.100.3.131.1317743071.squirrel@www.stat.columbia.edu> Hi all, I have a dataset of individuals where the variable ID corresponds to the identification of the household where the individual lives. rel.head stands for the relationship with the household head. so rel.head=1 is the household head, rel.head=2 is the spouse, rel.head=3 is the children. Here is an example to see how it looks like: df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", "17104", "17104", "17104", "17105", "17105"), rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) I want to add a dummy variable that is equal to 1 when these conditions held simultaneously : a) the number of rows with same ID is equal to 2 b) the variable rel.head=1 and rel.head=3 So my ideal output is: ID rel.head added.dummy 1 17100 1 1 2 17100 3 1 3 17101 1 0 4 17102 1 0 5 17103 1 0 6 17103 2 0 7 17104 1 0 8 17104 2 0 9 17104 3 0 10 17105 1 1 11 17105 3 1 Is there a simple way to do that? Can somebody help? Thanks in advance, Grazia From ligges at statistik.tu-dortmund.de Tue Oct 4 17:45:04 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 17:45:04 +0200 Subject: [R] Installation from local Compiled directory In-Reply-To: References: Message-ID: <4E8B2A00.9050007@statistik.tu-dortmund.de> On 03.10.2011 18:16, Sandeep Patil wrote: > Hello everyone > > I have manually compiled directory of gstat in a particular folder of my > Unix system. > I want to install this and am unable to use either of the following two > commands > > 1. R CMD INSTALL > 2. Install.packages > If this is a precompiled (i.e. binary) package produced for this R version and this OS, then the magic is to just copy the directory into your library. best, Uwe Ligges > I do not understand how to coax above commands to locate the directory that > i have > compiled. > > Please understand that i have solved a number of related issues concerning > this > installation and it is a special case where > > 1. I cannot use CRAN mirror to download and install > 2. Install from TAR file > > Essentially this is the only option i have. > > Thank you > > Sandeep > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From peter.ruckdeschel at web.de Tue Oct 4 17:49:57 2011 From: peter.ruckdeschel at web.de (Peter Ruckdeschel) Date: Tue, 04 Oct 2011 17:49:57 +0200 Subject: [R] [Workshop] Finance with R Message-ID: <4E8B2B25.7000407@web.de> The Financial Mathematics department of Fraunhofer ITWM is offering a two-days workshop on Finance with R: %----------------------------------------------------- [Workshop] Finance with R %----------------------------------------------------- Oct 20, 2011, 10:00-17:00 and Oct 21, 2011, 9:00-16:00 Fraunhofer ITWM, Fraunhofer-Platz 1, 67663 Kaiserslautern, Germany %----------------------------------------------------- Scope and purpose %----------------------------------------------------- This workshop provides an introduction to R for professionals and academics in Finance. It gives an insight into possibilities of data analysis and statistics with R, import of data sets, generation of graphics and preparation of reports, according to their relevance in Finance. Besides providing insight into financial modeling in R, in particular we demonstrate the use of the Rmetrics family of R packages as well as an R bridge to the Quantlib library. We also cover integration of R into Excel, interaction with Matlab, and import from Bloomberg. %------------------------ Benefits of attending %------------------------ The workshop provides insight into statistical models and concepts in R which are useful for various problems arising in Finance. The attendees will be able to import datasets into R, analyze them statistically and apply concepts from time series modeling. In practical sessions, the attendees will learn and practice how to use R. The fee for the workshop is 500 EUR. For further details, see http://www.itwm.fraunhofer.de/en/departments/financial-mathematics/events/2011-workshop-series.html Peter Ruckdeschel -- Dr. habil. Peter Ruckdeschel, Abteilung Finanzmathematik, F3.17 Fraunhofer ITWM, Fraunhofer Platz 1, 67663 Kaiserslautern Telefon: +49 631/31600-4699 Fax : +49 631/31600-5699 E-Mail : peter.ruckdeschel at itwm.fraunhofer.de http://www.itwm.fraunhofer.de/abteilungen/finanzmathematik/mitarbeiterinnen/mitarbeiter/dr-peter-ruckdeschel.html From ligges at statistik.tu-dortmund.de Tue Oct 4 17:51:55 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 17:51:55 +0200 Subject: [R] handling constant factors in prediction using svm In-Reply-To: <1317711223804-3870093.post@n4.nabble.com> References: <1317711223804-3870093.post@n4.nabble.com> Message-ID: <4E8B2B9B.9010902@statistik.tu-dortmund.de> On 04.10.2011 08:53, Divyam wrote: > Hi users! > > I am fitting a model with several factor variables as independents using > svm. since there are lots of categorical variables,the training and test > data sets have been created using dummy.data.frame option from dummies > package. I have a factor A in the training data set with 2 levels (0,1).In > the test set, this factor A has only 1 level (1) and hence when applying > dummy.data.frame, the variable gets dropped(and that's how i want it too). > The problem comes when I am trying to predict the test data as an error is > thrown saying A0 object is not found. Is there anyway to solve this > problem? Errr, if you learned a model that predicts based on several variables, including A0, what do you expect what happens if A0 is not given? Well, you cannot predict. So if A0 is constant in your test cases, just supply it! To simplify, consider a linear model y=bX+e. Now one column of X is missing for prediction. y will be undefined, obviously. Uwe Ligges > Thanks > Divya > > -- > View this message in context: http://r.789695.n4.nabble.com/handling-constant-factors-in-prediction-using-svm-tp3870093p3870093.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jdospina at gmail.com Tue Oct 4 17:39:45 2011 From: jdospina at gmail.com (jdospina) Date: Tue, 4 Oct 2011 08:39:45 -0700 (PDT) Subject: [R] The use of period in function names and variable names In-Reply-To: References: Message-ID: <1317742785460-3871407.post@n4.nabble.com> Hello. Not at all in the way you have shown. Just to improve your code "readability", try to avoid naming your variables beginning with period (example: .hello). In contrast with Matlab (for example) the period in R is not to have access to an object property. -- View this message in context: http://r.789695.n4.nabble.com/The-use-of-period-in-function-names-and-variable-names-tp3869913p3871407.html Sent from the R help mailing list archive at Nabble.com. From xn8spicer at gmail.com Tue Oct 4 16:42:44 2011 From: xn8spicer at gmail.com (Jeanne M. Spicer) Date: Tue, 4 Oct 2011 10:42:44 -0400 Subject: [R] inconsistent behavior of summary function In-Reply-To: <4E8AC17F.2020200@xtra.co.nz> References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> <1317711522584-3870106.post@n4.nabble.com> <4E8AC17F.2020200@xtra.co.nz> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From caspervector at gmail.com Tue Oct 4 17:59:13 2011 From: caspervector at gmail.com (Casper Ti. Vector) Date: Tue, 4 Oct 2011 23:59:13 +0800 Subject: [R] ggplot2: expression() in legend labels? In-Reply-To: References: <20110924084903.GA24599@CasperVector> Message-ID: <20111004155913.GA4149@CasperVector> Hmm, that's my fault when composing this mail, but the problem was really encountered at that time. Nevertheless, neither can I reproduce the problem now, perhaps I just made another mistake at that time. Thanks all the same, and sorry for the disturbance anyway :| On Tue, Oct 04, 2011 at 10:10:56AM -0500, Hadley Wickham wrote: > You need to set the labels... -- Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134, valid from 2010 to 2013) from a key server. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: From bbolker at gmail.com Tue Oct 4 18:01:14 2011 From: bbolker at gmail.com (Ben Bolker) Date: Tue, 4 Oct 2011 16:01:14 +0000 Subject: [R] Question about linear mixed effects model (nlme) References: <1317738852033-3871203.post@n4.nabble.com> Message-ID: Bert Gunter gene.com> writes: > > Below. > > On Tue, Oct 4, 2011 at 7:34 AM, Panagiotis hi.is> wrote: > > > Hi, > > > > I applied a linear mixed effect model in my data using the nlme package. > > lme2<-lme(distance~temperature*condition, random=~+1|trial, data) and then > > anova. > > I want to ask if it is posible to get the least squares means for the > > interaction effect and the corresponding 95%ci. And then plot this values. > > > > Uh-Oh. You may have unloosed "The Wrath of Khan" -- or at least of Venables. > (An explanation of this cryptic remark should follow from others, so please > do not ask me what it means if you do not know). > You should probably ask (a version of) this question on the r-sig-mixed-models list instead. What do you mean by "the least squares means for the interaction effect"? How is it different from the estimate of the interaction parameter? You can use the predict() function if you want to calculate predicted values for any particular combination of predictors (you probably want to specify level=0 to get the population-level effects). Getting 'good' confidence intervals for mixed-effect models is surprisingly difficult. If you are willing to ignore the uncertainty of the among-trial variance, you can use a modification of the recipe found at http://glmm.wikidot.com/faq From mailinglist.honeypot at gmail.com Tue Oct 4 18:05:29 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 4 Oct 2011 12:05:29 -0400 Subject: [R] The use of period in function names and variable names In-Reply-To: <1317742785460-3871407.post@n4.nabble.com> References: <1317742785460-3871407.post@n4.nabble.com> Message-ID: Hi, On Tue, Oct 4, 2011 at 11:39 AM, jdospina wrote: > Hello. > > Not at all in the way you have shown. Just to improve your code > "readability", try to avoid naming your variables beginning with period > (example: .hello). Well, that's not exactly true. It's "common practice" to name variables with a leading period if you want them to be considered "hidden," in some respect. See the `all.names` argument to the `ls` function, for instance. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From djmuser at gmail.com Tue Oct 4 18:08:09 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 4 Oct 2011 09:08:09 -0700 Subject: [R] ggplot2: expression() in legend labels? In-Reply-To: <20111004155913.GA4149@CasperVector> References: <20110924084903.GA24599@CasperVector> <20111004155913.GA4149@CasperVector> Message-ID: Hi: Here's a reproducible example: d <- data.frame(grp = factor(rep(c('x', 'y'), each = 5)), ev = rnorm(10), dv = rnorm(10)) labl <- list(expression(italic('x')), expression(italic('y'))) ggplot(d, aes(x = ev, y = dv, shape = grp)) + geom_point() + scale_shape_manual('Group', breaks = levels(d$grp), values = 1:2, labels = labl) HTH, Dennis On Tue, Oct 4, 2011 at 8:59 AM, Casper Ti. Vector wrote: > Hmm, that's my fault when composing this mail, but the problem was > really encountered at that time. > Nevertheless, neither can I reproduce the problem now, perhaps I just > made another mistake at that time. > Thanks all the same, and sorry for the disturbance anyway :| > > On Tue, Oct 04, 2011 at 10:10:56AM -0500, Hadley Wickham wrote: >> You need to set the labels... > > -- > ? ?Using GPG/PGP? Please get my current public key (ID: 0xAEF6A134, > valid from 2010 to 2013) from a key server. > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From ligges at statistik.tu-dortmund.de Tue Oct 4 18:11:40 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 4 Oct 2011 18:11:40 +0200 Subject: [R] Rug plot curve reversal In-Reply-To: References: Message-ID: <4E8B303C.6050705@statistik.tu-dortmund.de> On 04.10.2011 13:30, Peter Minting wrote: > > Dear R-help > Can anyone tell me why my curve appears the wrong way round on a rug plot? > I am using the same code as on pg 596 of the Crawley R-book. > mod<-glm(mort~logBd,binomial) What is mort, what is logBd? I don't have access to the book. I have hidden it in my other office so that nobody can find it anymore. > par(mfrow=c(2,2)) > xv<-seq(0,8,0.01) > yv<-predict(mod,list(logBd=xv),type="response") > plot(logBd,mort) > lines(xv,yv) > I've tried swapping xv and yv around but no luck. Hopefully mort is a binary factor, i.e. with two levels. I that case they are at positions 1 and 2 on the y axis in plot(). yv is the reponse, i.e. is in the interval (0,1) if the binomial glm was successful. So a different scale. So I guess lines(xv,yv+1) could help. Whatelse I think about "The R Book" can be found in my book review published in "Statistical Papers". Best, Uwe Ligges > Thanks, > Pete > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From murdoch.duncan at gmail.com Tue Oct 4 18:18:02 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 04 Oct 2011 12:18:02 -0400 Subject: [R] The use of period in function names and variable names In-Reply-To: References: Message-ID: <4E8B31BA.8060705@gmail.com> On 04/10/2011 7:04 AM, S Ellison wrote: > See para 10.3.2 'Identifiers' in the R language definition (always distributed with R in the html help system), or ?make.names, for a concise statement of what constitutes a valid variable name in R. > > It's actually underscores that might give trouble with older versions, not '.'. But they'd have to be a lot older by R standards (pre 1.9.0). > > I am not sure why there has been a recent shift away from periods and towards camelCase in some R packages; Presumably the authors of those packages prefer camelCase. I don't think it's any more complicated than that. Duncan Murdoch > personally I find a period or underscore much more useful for making a variable name readable. And a mix of camelCase and period.breaks makes it a lot harder to guess which case-sensitive string to use. The number of different combinations of case and period I end up trying for R.Version (occasionally used, never quite often enought to be automatic) defies belief ;-). > > > S Ellison > > > From: r-help-bounces at r-project.org On Behalf Of Smart Guy > > Sent: 04 October 2011 05:20 > > To: r-help at r-project.org > > Subject: [R] The use of period in function names and variable names > > > > Hi, > > I am looking for some guidance on whether I can use the > > period(.) in function names and variable names. > > ******************************************************************* > This email and any attachments are confidential. Any use...{{dropped:8}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue Oct 4 18:19:15 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 18:19:15 +0200 Subject: [R] inconsistent behavior of summary function In-Reply-To: References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> <1317711522584-3870106.post@n4.nabble.com> <4E8AC17F.2020200@xtra.co.nz> Message-ID: <4E8B3203.2040004@statistik.tu-dortmund.de> On 04.10.2011 16:42, Jeanne M. Spicer wrote: > I'm not sure how returning an incorrect result is ever a 'positive' feature but at least the documentation could more clearly warn users that this method behaves differently in these cases -- summary(rock[,1]) vs summary(rock[,1:2]) -- and that the method can and does return incorrect results without any warning messages. What are you talking about? Probably it appeared prior in this thread? Please always cite. Anyway, I guess you werre looking for summary(rock[,1, drop=FALSE]) rock[,1] is implified to a vector whle rock[,1:2] is still a matrix or data.frame (and since this is not cited, I do not know). > I would encourage anyone teaching introductory R to look at the 'epicalc' package. The re-vamped function 'summ' in that package returns correct results regardless - summ(rock), summ(rock$area). In addition, when you only ask for one column you not only get the correct results, you also get a bonus distribution plot. > > I'd would like all of our students to use R, but little things like this are huge stumbling blocks for them. Then you told them about summary() before telling how to deal with data structures correctly. And that is te m,ost important part in learning R. I know from my courses that applied people do not like that, but I always managed to convince them this is the most impoertant topic to learn about R. Best, Uwe Ligges > -jeanne > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue Oct 4 18:24:31 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 18:24:31 +0200 Subject: [R] The use of period in function names and variable names In-Reply-To: <4E8B31BA.8060705@gmail.com> References: <4E8B31BA.8060705@gmail.com> Message-ID: <4E8B333F.4090208@statistik.tu-dortmund.de> On 04.10.2011 18:18, Duncan Murdoch wrote: > On 04/10/2011 7:04 AM, S Ellison wrote: >> See para 10.3.2 'Identifiers' in the R language definition (always >> distributed with R in the html help system), or ?make.names, for a >> concise statement of what constitutes a valid variable name in R. >> >> It's actually underscores that might give trouble with older versions, >> not '.'. But they'd have to be a lot older by R standards (pre 1.9.0). >> >> I am not sure why there has been a recent shift away from periods and >> towards camelCase in some R packages; > > Presumably the authors of those packages prefer camelCase. I don't think > it's any more complicated than that. I switched to that when I realized that it is somewhat dangerous to conflict with S3 naming conventions and R CMD check yelled correctly because I used a generic.class notation where either "generic" or "class" was really the name of a generic or class but I had not realized before. Uwe > > Duncan Murdoch > > >> personally I find a period or underscore much more useful for making a >> variable name readable. And a mix of camelCase and period.breaks makes >> it a lot harder to guess which case-sensitive string to use. The >> number of different combinations of case and period I end up trying >> for R.Version (occasionally used, never quite often enought to be >> automatic) defies belief ;-). >> >> >> S Ellison >> >> > From: r-help-bounces at r-project.org On Behalf Of Smart Guy >> > Sent: 04 October 2011 05:20 >> > To: r-help at r-project.org >> > Subject: [R] The use of period in function names and variable names >> > >> > Hi, >> > I am looking for some guidance on whether I can use the >> > period(.) in function names and variable names. >> >> ******************************************************************* >> This email and any attachments are confidential. Any use...{{dropped:8}} >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue Oct 4 18:25:54 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 18:25:54 +0200 Subject: [R] Problem with .C In-Reply-To: <4E8AF632.1030803@mathematik.uni-marburg.de> References: <4D9AE45D.70205@mathematik.uni-marburg.de> <4E8AF632.1030803@mathematik.uni-marburg.de> Message-ID: <4E8B3392.30105@statistik.tu-dortmund.de> Without knowing that C code, we cannot know. Have you read Writing R Extensions carefully? I.e. take care with memory allocation and printing as mentioned in the manual. Uwe Ligges On 04.10.2011 14:04, Grigory Alexandrovich wrote: > Hello, > > I wrote a function in C, which works fine if called from the > main-function in C. > > But as soon as I try to call this function from R like .C('foo', > as.double(x), as.integer(y)), the programm crashes. > > I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and > loaded it into R with dyn.load(). > > What can be the cause of such behaviour? > Again, the C-funcion itself works, but not if called from R. > > Thanks > Grigory Alexandrovich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Martyn.Byng at nag.co.uk Tue Oct 4 18:40:44 2011 From: Martyn.Byng at nag.co.uk (Martyn Byng) Date: Tue, 4 Oct 2011 17:40:44 +0100 Subject: [R] adding a dummy variable... References: <1134.151.100.3.131.1317743071.squirrel@www.stat.columbia.edu> Message-ID: <49E76DF37649DC48A4CE882BC8CE51C901DF593E@nagmail2.nag.co.uk> Hi, I am sure there are better / more efficient ways of doing this, but the following seems to work ... ids <- sapply(split(df,df$ID),function(x) {length(x$rel.head)==2 & any(x$rel.head==1) & any(x$rel.head==3)}) ids <- as.numeric(names(ids)[ids]) added.dummy <- as.numeric(df$ID%in%ids) cbind(df,added.dummy) Martyn -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of grazia at stat.columbia.edu Sent: 04 October 2011 16:45 To: r-help at r-project.org Subject: [R] adding a dummy variable... Hi all, I have a dataset of individuals where the variable ID corresponds to the identification of the household where the individual lives. rel.head stands for the relationship with the household head. so rel.head=1 is the household head, rel.head=2 is the spouse, rel.head=3 is the children. Here is an example to see how it looks like: df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", "17104", "17104", "17104", "17105", "17105"), rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) I want to add a dummy variable that is equal to 1 when these conditions held simultaneously : a) the number of rows with same ID is equal to 2 b) the variable rel.head=1 and rel.head=3 So my ideal output is: ID rel.head added.dummy 1 17100 1 1 2 17100 3 1 3 17101 1 0 4 17102 1 0 5 17103 1 0 6 17103 2 0 7 17104 1 0 8 17104 2 0 9 17104 3 0 10 17105 1 1 11 17105 3 1 Is there a simple way to do that? Can somebody help? Thanks in advance, Grazia ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ________________________________________________________________________ This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}} From jdnewmil at dcn.davis.ca.us Tue Oct 4 18:45:57 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Tue, 04 Oct 2011 09:45:57 -0700 Subject: [R] Problem with .C In-Reply-To: <4E8AF632.1030803@mathematik.uni-marburg.de> References: <4D9AE45D.70205@mathematik.uni-marburg.de> <4E8AF632.1030803@mathematik.uni-marburg.de> Message-ID: <4660b119-192b-4ff1-bbdf-39483129ab8f@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Tue Oct 4 18:49:58 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 04 Oct 2011 18:49:58 +0200 Subject: [R] Giant font on the R plots... In-Reply-To: <1317719413281-3870335.post@n4.nabble.com> References: <1317719413281-3870335.post@n4.nabble.com> Message-ID: <4E8B3936.4040801@statistik.tu-dortmund.de> On 04.10.2011 11:10, D.Emad wrote: > Hello, > > I've been facing a really stupid problem... When I try to plot using > heatplot or hclust or any similar function, the labels of the x-axis - which > are the samples names - are giant& overlapping. I can't even read the > samples names! R> heatplot Error: object 'heatplot' not found R> hclust(dist(USArrests), "ave") # does not plot anything So let m try R> plot(hclust(dist(USArrests), "ave")) # no x axis Do you mean the labels at the dendrogram? These are controlled by cex (rather than cex.lab). Uwe Ligges > > I tried cex.lab = 0.5, it helped only with the y axis and not the x-axis... > Any help please?! > > -- > View this message in context: http://r.789695.n4.nabble.com/Giant-font-on-the-R-plots-tp3870335p3870335.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jason at joines.org Tue Oct 4 18:51:18 2011 From: jason at joines.org (Jason Paul Joines) Date: Tue, 04 Oct 2011 11:51:18 -0500 Subject: [R] number of analogs in significance test of MAT reconstructions using randomTF from palaeoSig Message-ID: <4E8B3986.8040408@joines.org> I'm trying to use the randomTF function from package palaeoSig to test the significance of a MAT reconstruction with nine analogs and a WA-PLS reconstruction with four components. I'm probably missing something obvious here but how do I make sure that randomTF is testing the reconstruction based on the desired number of analogs / components? In: fitmap.wapls = WAPLS( lumapspc, lumap) sig.wapls = randomTF( spp = sqrt( lumapspc ), env = lumapenv, fos = sqrt( hcspc ), n = 999, fun = WAPLS, col = 4 ) I assume "col = 4" tells randomTF to test the reconstruction based on the four component WA-PLS model as that's what the documentation seems to indicate. However, in: fitmap.mat = MAT( lumapspc, lumap, dist.method = "chord", k = 20 ) sig.mat = randomTF( spp = lumapspc, env = lumapenv, fos = hcspc, n = 999, fun = MAT, col = 9 ) it seems that "col = 9" does not tell randomTF to test the reconstruction based on the 9 analog MAT model. If I give col a value other than one or two, I get a "subscript out of bounds" error. So I assume the col argument in this case selects between the mean and weighted mean predictions. If I pass additional arguments, k = 9 and dist.method = "chord" to randomTF, then the values of sig.mat$preds do not match the values obtained from: predmap.mat = predict( fitmap.mat, hcspc, k = 9 ) Also, if I give randomTF a k value less than 5, I get the error "k out of range". So, passing k to randomTF must not be telling randomTF to use that number of analogs as I would not be able to select a four analog model. Jason =========== From Thomas.Adams at noaa.gov Tue Oct 4 19:01:11 2011 From: Thomas.Adams at noaa.gov (Thomas.Adams at noaa.gov) Date: Tue, 04 Oct 2011 13:01:11 -0400 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: References: <4E89EFE3.5050804@noaa.gov> Message-ID: Hadley, Thanks for responding. No, not smoothed quantile regression. If you go here: http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored squares, you can see we have 'boxplots'. What I want to express is the uncertainty as depicted in the example from my previous email where I can specify the limits calculated for the 'boxplots' using 5%, 25%,75%, 95% limits as we have with the 'boxplots'. Tom ----- Original Message ----- From: Hadley Wickham Date: Tuesday, October 4, 2011 10:23 am Subject: Re: [R] Question about ggplot2 and stat_smooth To: Thomas Adams Cc: R-help forum > On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams > wrote: > > ?I'm interested in creating a graphic -like- this: > > > > c <- ggplot(mtcars, aes(qsec, wt)) > > c + geom_point() + stat_smooth(fill="blue", colour="darkblue", > size=2, alpha > > = 0.2) > > > > but I need to show 2 sets of bands (with different shading) using > 5%, 25%, > > 75%, 95% limits that I specify and where the heavy blue line is the > median. > > I don't understand how to do this with ggplot2. > > Exactly what sort of limits do you want? It sounds like maybe you are > looking for smoothed quantile regression. > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > From djmuser at gmail.com Tue Oct 4 19:02:38 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 4 Oct 2011 10:02:38 -0700 Subject: [R] adding a dummy variable... In-Reply-To: <1134.151.100.3.131.1317743071.squirrel@www.stat.columbia.edu> References: <1134.151.100.3.131.1317743071.squirrel@www.stat.columbia.edu> Message-ID: Hi: Here's another way to do it with the plyr package, also not terribly elegant. It assumes that rel.head is a factor in your original data frame: > str(df) 'data.frame': 11 obs. of 2 variables: $ ID : Factor w/ 6 levels "17100","17101",..: 1 1 2 3 4 4 5 5 5 6 ... $ rel.head: Factor w/ 3 levels "1","2","3": 1 3 1 1 1 2 1 2 3 1 ... If this is not the case in your data, then you need to modify the function f below accordingly. (This is why use of dput() is preferred when sending example data to R-help, BTW.) library('plyr') f <- function(d) { tvec <- factor(c(1, 3), levels = 1:3) # target vector if(nrow(d) != 2L) {d$dummy <- rep(0, nrow(d)); return(d)} # If the first if statement is FALSE, then the following code is run: d$dummy <- ifelse(!identical(d[, 2], tvec), 0, 1) d } ddply(df, .(ID), f) ID rel.head dummy 1 17100 1 1 2 17100 3 1 3 17101 1 0 4 17102 1 0 5 17103 1 0 6 17103 2 0 7 17104 1 0 8 17104 2 0 9 17104 3 0 10 17105 1 1 11 17105 3 1 HTH, Dennis On Tue, Oct 4, 2011 at 8:44 AM, wrote: > Hi all, > > I have a dataset of individuals where the variable ID corresponds to the > identification of the household where the individual lives. rel.head stands > for the relationship with the household head. so rel.head=1 is the household > head, rel.head=2 is the spouse, rel.head=3 is the children. > > Here is an example to see how it looks like: > > df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", > ? ? ? ? ? ? ? ? ? ? "17104", "17104", "17104", "17105", "17105"), > ?rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) > > > I want to add a dummy variable that is equal to 1 when these conditions > held simultaneously : > > a) the number of rows with same ID is equal to 2 > b) the variable rel.head=1 and rel.head=3 > > > So my ideal output is: > > ? ID ? ? ?rel.head ? added.dummy > 1 ?17100 ? ? ? ?1 ? ? ? ? ? 1 > 2 ?17100 ? ? ? ?3 ? ? ? ? ? 1 > 3 ?17101 ? ? ? ?1 ? ? ? ? ? 0 > 4 ?17102 ? ? ? ?1 ? ? ? ? ? 0 > 5 ?17103 ? ? ? ?1 ? ? ? ? ? 0 > 6 ?17103 ? ? ? ?2 ? ? ? ? ? 0 > 7 ?17104 ? ? ? ?1 ? ? ? ? ? 0 > 8 ?17104 ? ? ? ?2 ? ? ? ? ? 0 > 9 ?17104 ? ? ? ?3 ? ? ? ? ? 0 > 10 17105 ? ? ? ?1 ? ? ? ? ? 1 > 11 17105 ? ? ? ?3 ? ? ? ? ? 1 > > Is there a simple way to do that? > Can somebody help? > > Thanks in advance, > Grazia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From vioravis at gmail.com Tue Oct 4 19:17:01 2011 From: vioravis at gmail.com (vioravis) Date: Tue, 4 Oct 2011 10:17:01 -0700 (PDT) Subject: [R] Reading stopwords from a csv file Message-ID: <1317748621872-3871697.post@n4.nabble.com> I am using the tm package to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist <- read.csv("stopwords to be Removed 10042011.csv") myStopwords <- as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_map(tr1,removeWords,myStopwords) I am getting the following error: Error in gsub(sprintf("\\b(%s)\\b", paste(words, collapse = "|")), "", : internal error in compiling regexp However, this works fine when I define myStopwords = c(....) instead of reading from the csv file. Can someone please help me to resolve this issue? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871697.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Tue Oct 4 19:52:23 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 4 Oct 2011 10:52:23 -0700 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: References: <4E89EFE3.5050804@noaa.gov> Message-ID: Hi: The smooth is not going to replicate the quantile estimates you get from the 'boxplots'; the smooth is estimating a conditional mean using loess, with confidence limits associated with uncertainty in the estimate of the conditional mean function, which are almost certainly going to be narrower than the corresponding quantiles of the data distributions. If you want to mimic the behavior in the 'boxplots', I would save the information from them into a data frame with columns for each quantile, assign variable names to the quantiles, melt the corresponding data frame so that the quantile names become factor levels (with whatever variable is used to distinguish the 'boxplots' as the ID variable in melt()), and then use ggplot2 or lattice to plot the corresponding sets of lines. Here's an example: library('plyr') library('reshape') # Toy data frame dd <- data.frame(year = rep(2000:2008, each = 500), y = rnorm(4500)) # Function to compute quantiles and return a data frame g <- function(d) { qq <- as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95)))) names(qq) <- paste('Q', c(5, 25, 50, 75, 95), sep = '') qq } # Apply function to each year of data in dd: qdf <- ddply(dd, .(year), g) # melt to produce a factor variable whose levels are quantiles qdfm <- melt(qdf, id = 'year') # Use ggplot() to plot the boxplots and quantile lines: ggplot() + geom_boxplot(data = dd, aes(x = factor(year), y = y)) + geom_line(data = qdfm, aes(x = factor(year), y = value, group = variable, colour = variable), size = 1) + labs(x = 'Year', colour = 'Quantile') The idea of superimposing the lines over the boxplots is to show that the default method of quantile() corresponds to the quantile() method used to generate boxplots in ggplot2. Is that closer to what you're after? If you want, you can always use geom_ribbon() to shade the areas between the lines and scale_colour_manual() to manually specify the line colors. Using the above example, here's one way, using the unmelted quantile data: ggplot(qdf, aes(x = year, y = Q50)) + geom_line(size = 2, color = 'navyblue') + geom_ribbon(aes(ymin = Q25, ymax = Q75), fill = 'blue', alpha = 0.4) + geom_ribbon(aes(ymin = Q5, ymax = Q25), fill = 'blue', alpha = 0.2) + geom_ribbon(aes(ymin = Q75, ymax = Q95), fill = 'blue', alpha = 0.2) + labs(x = 'Year', y = 'Y') Dennis On Tue, Oct 4, 2011 at 10:01 AM, wrote: > Hadley, > > Thanks for responding. No, not smoothed quantile regression. If you go here: http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored squares, you can see we have 'boxplots'. What I want to express is the uncertainty as depicted in the example from my previous email where I can specify the limits calculated for the 'boxplots' using ?5%, 25%,75%, 95% limits as we have with the 'boxplots'. > > Tom > > ----- Original Message ----- > From: Hadley Wickham > Date: Tuesday, October 4, 2011 10:23 am > Subject: Re: [R] Question about ggplot2 and stat_smooth > To: Thomas Adams > Cc: R-help forum > > >> On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams >> wrote: >> > ?I'm interested in creating a graphic -like- this: >> > >> > c <- ggplot(mtcars, aes(qsec, wt)) >> > c + geom_point() + stat_smooth(fill="blue", colour="darkblue", >> size=2, alpha >> > = 0.2) >> > >> > but I need to show 2 sets of bands (with different shading) using >> 5%, 25%, >> > 75%, 95% limits that I specify and where the heavy blue line is the >> median. >> > I don't understand how to do this with ggplot2. >> >> Exactly what sort of limits do you want? ?It sounds like maybe you are >> looking for smoothed quantile regression. >> >> Hadley >> >> -- >> Assistant Professor / Dobelman Family Junior Chair >> Department of Statistics / Rice University >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From rshepard at appl-ecosys.com Tue Oct 4 20:39:31 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Tue, 4 Oct 2011 11:39:31 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows Message-ID: I have a data frame called chemdata with this structure: > str(chemdata) 'data.frame': 14886 obs. of 4 variables: $ site : Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 128 124 2 3 3 ... $ sampdate: Date, format: "1996-12-27" "1996-08-22" ... $ param : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... $ quant : num 0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... I've looked in the R Cookbook and Dalgaard's intro book without finding a way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID when subdsetting a data frame.. I need to create subsets (as data frames) based on sites, but including all sites on each stream. For example, using the initial site factor shown above, I want a subset containing all data for sites "BC-0.5", "BC-1". "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". Pointers appreciated, Rich From sarah.goslee at gmail.com Tue Oct 4 20:46:08 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Tue, 4 Oct 2011 14:46:08 -0400 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: Hi Rich, You can use something like this: > testdata <- c("A1", "A2", "A3", "B1", "B2", "B3") > grep("^A", testdata) [1] 1 2 3 > grepl("^A", testdata) [1] TRUE TRUE TRUE FALSE FALSE FALSE Sarah On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard wrote: > ?I have a data frame called chemdata with this structure: > >> str(chemdata) > > 'data.frame': ? 14886 obs. of ?4 variables: > ?$ site ? ?: Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 > 128 124 2 3 3 ... > ?$ sampdate: Date, format: "1996-12-27" "1996-08-22" ... > ?$ param ? : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... > ?$ quant ? : num ?0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... > > ?I've looked in the R Cookbook and Dalgaard's intro book without finding a > way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID > when subdsetting a data frame.. > > ?I need to create subsets (as data frames) based on sites, but including > all sites on each stream. For example, using the initial site factor shown > above, I want a subset containing all data for sites "BC-0.5", "BC-1". > "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". > > Pointers appreciated, > > Rich > -- Sarah Goslee http://www.functionaldiversity.org From michael.weylandt at gmail.com Tue Oct 4 20:46:52 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Tue, 4 Oct 2011 14:46:52 -0400 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: This isn't going to be the most elegant, but it should work: ## Get the factors as characters ff <- as.character(chemdata$site) ## Identify those that match what you want ff <- grepl(ff, "BC-") now use this logical vector to subset chemdata[ff, ] Can't test, but should be good to go assuming that "BC-" entirely identifies those sites you want. If you have other "BC-" things read through the ?regex documentation and I think it describes how to do selective wildcards Michael On Tue, Oct 4, 2011 at 2:39 PM, Rich Shepard wrote: > ?I have a data frame called chemdata with this structure: > >> str(chemdata) > > 'data.frame': ? 14886 obs. of ?4 variables: > ?$ site ? ?: Factor w/ 148 levels "BC-0.5","BC-1",..: 104 145 126 115 114 > 128 124 2 3 3 ... > ?$ sampdate: Date, format: "1996-12-27" "1996-08-22" ... > ?$ param ? : Factor w/ 8 levels "As","Ca","Cl",..: 1 1 1 1 1 1 1 1 1 1 ... > ?$ quant ? : num ?0.06 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ... > > ?I've looked in the R Cookbook and Dalgaard's intro book without finding a > way to use wildcards (e.g., like "BC-*") or explicitly witing each site ID > when subdsetting a data frame.. > > ?I need to create subsets (as data frames) based on sites, but including > all sites on each stream. For example, using the initial site factor shown > above, I want a subset containing all data for sites "BC-0.5", "BC-1". > "BC-2", "BC-3", "BC-4", "BC-5", and "BC-6". > > Pointers appreciated, > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From rshepard at appl-ecosys.com Tue Oct 4 20:58:15 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Tue, 4 Oct 2011 11:58:15 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Tue, 4 Oct 2011, Sarah Goslee wrote: > You can use something like this: > >> testdata <- c("A1", "A2", "A3", "B1", "B2", "B3") >> grep("^A", testdata) > [1] 1 2 3 >> grepl("^A", testdata) > [1] TRUE TRUE TRUE FALSE FALSE FALSE Sarah, I don't see how this gives me a data frame containing only those sites I specify. I want to plot by sites-within-streams specifying which param factor to use. Thanks, Rich From sarah.goslee at gmail.com Tue Oct 4 21:03:27 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Tue, 4 Oct 2011 15:03:27 -0400 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: Hi Rich, On Tue, Oct 4, 2011 at 2:58 PM, Rich Shepard wrote: > On Tue, 4 Oct 2011, Sarah Goslee wrote: > >> You can use something like this: >> >>> testdata <- c("A1", "A2", "A3", "B1", "B2", "B3") >>> grep("^A", testdata) >> >> [1] 1 2 3 >>> >>> grepl("^A", testdata) >> >> [1] ?TRUE ?TRUE ?TRUE FALSE FALSE FALSE > > Sarah, > > ?I don't see how this gives me a data frame containing only those sites I > specify. I want to plot by sites-within-streams specifying which param > factor to use. You asked for pointers, and didn't provide a reproducible example, so I offered a pointer. If you have a logical vector that specifies whether to include or omit a row, you can use that to subset your data frame. sitesToUse <- grepl("firstsite", mydata$mysitenames) dataframeForThatSite <- mydata[sitesToUse, ] If you want real worked results, you'll need to provide a reproducible example of your own. Sarah -- Sarah Goslee http://www.functionaldiversity.org From rshepard at appl-ecosys.com Tue Oct 4 21:07:10 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Tue, 4 Oct 2011 12:07:10 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Tue, 4 Oct 2011, R. Michael Weylandt wrote: > This isn't going to be the most elegant, but it should work: > ## Get the factors as characters > ff <- as.character(chemdata$site) >> ## Identify those that match what you want > ff <- grepl(ff, "BC-") Michael, Apparently grep works differently in R than it does on the command line: bf <- grep(ff, "BC-") Warning message: In grep(ff, "BC-") : argument 'pattern' has length > 1 and only the first element will be used I understand what you suggest but it does not appear to work for me. Thanks, Rich From jdnewmil at dcn.davis.ca.us Tue Oct 4 21:08:50 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Tue, 04 Oct 2011 12:08:50 -0700 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From baptiste.auguie at googlemail.com Tue Oct 4 21:09:02 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Wed, 5 Oct 2011 08:09:02 +1300 Subject: [R] adding a dummy variable... In-Reply-To: References: <1134.151.100.3.131.1317743071.squirrel@www.stat.columbia.edu> Message-ID: Hi, Using ddply, ddply(df, .(ID), mutate, nrows=length(rel.head), test = nrows==2 & all(rel.head %in% c(1,3))) HTH, baptiste On 5 October 2011 06:02, Dennis Murphy wrote: > Hi: > > Here's another way to do it with the plyr package, also not terribly > elegant. It assumes that rel.head is a factor in your original data > frame: >> str(df) > 'data.frame': ? 11 obs. of ?2 variables: > ?$ ID ? ? ?: Factor w/ 6 levels "17100","17101",..: 1 1 2 3 4 4 5 5 5 6 ... > ?$ rel.head: Factor w/ 3 levels "1","2","3": 1 3 1 1 1 2 1 2 3 1 ... > > If this is not the case in your data, then you need to modify the > function f below accordingly. (This is why use of dput() is preferred > when sending example data to R-help, BTW.) > > library('plyr') > f <- function(d) { > ? ?tvec <- factor(c(1, 3), levels = 1:3) ? # target vector > ? ?if(nrow(d) != 2L) {d$dummy <- rep(0, nrow(d)); return(d)} > ? ?# If the first if statement is FALSE, then the following code is run: > ? ? ? d$dummy <- ifelse(!identical(d[, 2], tvec), 0, 1) > ? ? ? d > ? } > > ddply(df, .(ID), f) > > ? ? ?ID rel.head dummy > 1 ?17100 ? ? ? ?1 ? ? 1 > 2 ?17100 ? ? ? ?3 ? ? 1 > 3 ?17101 ? ? ? ?1 ? ? 0 > 4 ?17102 ? ? ? ?1 ? ? 0 > 5 ?17103 ? ? ? ?1 ? ? 0 > 6 ?17103 ? ? ? ?2 ? ? 0 > 7 ?17104 ? ? ? ?1 ? ? 0 > 8 ?17104 ? ? ? ?2 ? ? 0 > 9 ?17104 ? ? ? ?3 ? ? 0 > 10 17105 ? ? ? ?1 ? ? 1 > 11 17105 ? ? ? ?3 ? ? 1 > > HTH, > Dennis > > On Tue, Oct 4, 2011 at 8:44 AM, ? wrote: >> Hi all, >> >> I have a dataset of individuals where the variable ID corresponds to the >> identification of the household where the individual lives. rel.head stands >> for the relationship with the household head. so rel.head=1 is the household >> head, rel.head=2 is the spouse, rel.head=3 is the children. >> >> Here is an example to see how it looks like: >> >> df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", >> ? ? ? ? ? ? ? ? ? ? "17104", "17104", "17104", "17105", "17105"), >> ?rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) >> >> >> I want to add a dummy variable that is equal to 1 when these conditions >> held simultaneously : >> >> a) the number of rows with same ID is equal to 2 >> b) the variable rel.head=1 and rel.head=3 >> >> >> So my ideal output is: >> >> ? ID ? ? ?rel.head ? added.dummy >> 1 ?17100 ? ? ? ?1 ? ? ? ? ? 1 >> 2 ?17100 ? ? ? ?3 ? ? ? ? ? 1 >> 3 ?17101 ? ? ? ?1 ? ? ? ? ? 0 >> 4 ?17102 ? ? ? ?1 ? ? ? ? ? 0 >> 5 ?17103 ? ? ? ?1 ? ? ? ? ? 0 >> 6 ?17103 ? ? ? ?2 ? ? ? ? ? 0 >> 7 ?17104 ? ? ? ?1 ? ? ? ? ? 0 >> 8 ?17104 ? ? ? ?2 ? ? ? ? ? 0 >> 9 ?17104 ? ? ? ?3 ? ? ? ? ? 0 >> 10 17105 ? ? ? ?1 ? ? ? ? ? 1 >> 11 17105 ? ? ? ?3 ? ? ? ? ? 1 >> >> Is there a simple way to do that? >> Can somebody help? >> >> Thanks in advance, >> Grazia >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From gunter.berton at gene.com Tue Oct 4 21:10:12 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 4 Oct 2011 12:10:12 -0700 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Tue Oct 4 21:10:37 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Tue, 4 Oct 2011 15:10:37 -0400 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: No, that was just a typo on my end: the correct order of arguments should have been ff <- grepl("BC-", ff) On Tue, Oct 4, 2011 at 3:07 PM, Rich Shepard wrote: > On Tue, 4 Oct 2011, R. Michael Weylandt wrote: > >> This isn't going to be the most elegant, but it should work: >> ## Get the factors as characters >> ff <- as.character(chemdata$site) >>> >>> ## Identify those that match what you want >> >> ff <- grepl(ff, "BC-") > > Michael, > > ?Apparently grep works differently in R than it does on the command line: > > bf <- grep(ff, "BC-") > Warning message: > In grep(ff, "BC-") : > ?argument 'pattern' has length > 1 and only the first element will be used > > ?I understand what you suggest but it does not appear to work for me. > > Thanks, > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From hadley at rice.edu Tue Oct 4 21:10:47 2011 From: hadley at rice.edu (Hadley Wickham) Date: Tue, 4 Oct 2011 14:10:47 -0500 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: References: <4E89EFE3.5050804@noaa.gov> Message-ID: > # Function to compute quantiles and return a data frame > g <- function(d) { > ? qq <- as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95)))) > ? names(qq) <- paste('Q', c(5, 25, 50, 75, 95), sep = '') > ? qq ? } You could cut out the melt step by making this return a data frame: g <- function(df, qs = c(.05, .25, .50, .75, .95)) { data.frame(q = qs, quantile(d$y, qs)) } Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ From cmcclure at atrcorp.com Tue Oct 4 20:25:37 2011 From: cmcclure at atrcorp.com (Charles McClure) Date: Tue, 4 Oct 2011 14:25:37 -0400 Subject: [R] Tinn-R Message-ID: <009201cc82c3$04424fb0$7800a8c0@ATRPDC.ATRColumbia.Com> I am new to R and have recently tried Tinn-R with very mixed and unexpected results. Can you point me to a Tinn-R tutorial on the web or a decent reference book? Thank you for your help; Charles McClure cmcclure at atrcorp.com cfmcclure at verizon.net From marcus.nunes at gmail.com Tue Oct 4 20:17:42 2011 From: marcus.nunes at gmail.com (Marcus Nunes) Date: Tue, 4 Oct 2011 14:17:42 -0400 Subject: [R] F-values in nested designs Message-ID: Hello all I'm trying to learn how to fit a nested model in R. I found a toy example on internet where a dataset that have?3 areas and 4 sites within these areas. When I use Minitab to fit a nested model to this data, this is the ANOVA table that I got: Nested ANOVA: y versus areas, sites Analysis of Variance for y Source DF SS MS F P areas 2 4.5000 2.2500 0.158 0.856 sites 9 128.2500 14.2500 3.167 0.012 Error 24 108.0000 4.5000 Total 35 240.7500 When I use R, this is the ANOVA table that I got: summary(aov(y ~ areas + Error(areas%in%sites))) Error: areas:sites Df Sum Sq Mean Sq F value Pr(>F) areas 2 4.50 2.25 0.1579 0.8563 Residuals 9 128.25 14.25 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 24 108 4.5 Warning message: In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular The results are the same, except for one F-value and I don't understand why. Hence, these are my questions: 1) I searched google and I can't find a reason to have this warning in my code. Why is this happening? 2) why I don't have an F-value for the nested effect? I realize that R call it as Residuals in the first part of the summary, but there is a way to make R consider it s another factor? INB4: if I have a nested design with treatment A and treatment B within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I make R give these values directly, without further coding? Thanks for your help. Below is my code and information about my system. ---------------------- y = c(10, 12, 8, 13, 14, 8, 10, 12, 9, 10, 12, 11, 11, 13, 9, 10, 14, 11, 10, 9, 8, 9, 8, 8, 13, 14, 7, 10, 10, 13, 9, 7, 16, 12, 5, 4) areas = as.factor(rep(c("m1", "m2", "m3"), each=12)) #sites = as.factor(c(rep(c(1, 2, 3, 4), 3), rep(c(5, 6, 7, 8), 3), rep(c(9, 10, 11, 12), 3))) sites = as.factor(c(rep(c(1, 2, 3, 4), 9))) repl = as.factor(rep(c(1, 2, 3), each=4, 3)) summary(aov(y ~ areas + Error(areas%in%sites))) summary(aov(y ~ areas + Error(areas%in%sites))) Error: areas:sites ? ? ? ? ? Df Sum Sq Mean Sq F value Pr(>F) areas ? ? ?2 ? 4.50 ? ?2.25 ?0.1579 0.8563 Residuals ?9 128.25 ? 14.25 Error: Within ? ? ? ? ? Df Sum Sq Mean Sq F value Pr(>F) Residuals 24 ? ?108 ? ? 4.5 Warning message: In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular sessionInfo() R version 2.13.1 Patched (2011-08-25 r56798) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] car_2.0-11 survival_2.36-9 nnet_7.3-1 [4] MASS_7.3-14 lme4_0.999375-40 Matrix_0.999375-50 [7] lattice_0.19-33 nlme_3.1-102 loaded via a namespace (and not attached): [1] grid_2.13.1 stats4_2.13.1 tools_2.13.1 -- Marcus Nunes marcus.nunes at gmail.com From vioravis at gmail.com Tue Oct 4 20:15:27 2011 From: vioravis at gmail.com (vioravis) Date: Tue, 4 Oct 2011 11:15:27 -0700 (PDT) Subject: [R] Reading stopwords from a csv file In-Reply-To: <1317748621872-3871697.post@n4.nabble.com> References: <1317748621872-3871697.post@n4.nabble.com> Message-ID: <1317752127976-3871864.post@n4.nabble.com> The following for loops does the work but it takes a good 30 minutes to run: for(i in 1:length(myStopwords)) { currentWord <- myStopwords[i] tr1=tm_map(tr1,removeWords,currentWord) } Are there any faster alternatives?? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Reading-stopwords-from-a-csv-file-tp3871697p3871864.html Sent from the R help mailing list archive at Nabble.com. From michael.weylandt at gmail.com Tue Oct 4 21:39:19 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Tue, 4 Oct 2011 15:39:19 -0400 Subject: [R] a question about sort and BH In-Reply-To: References: Message-ID: On Mon, Oct 3, 2011 at 10:08 PM, chunjiang he wrote: > Hi, > > I have two questions want to ask. > > 1. If I have a matrix like this, and I want to figure out the rows whose > value in the 3rd column are less than 0.05. How can I do it with R. > hsa-let-7a--MBTD1 ? ?0.528239197 ? ?2.41E-05 > hsa-let-7a--APOBEC1 ? ?0.507869409 ? ?5.51E-05 > hsa-let-7a--PAPOLA ? ?0.470451884 ? ?0.000221774 > hsa-let-7a--NF2 ? ?0.469280186 ? ?0.000231065 > hsa-let-7a--SLC17A5 ? ?0.454597978 ? ?0.000381713 > hsa-let-7a--THOC2 ? ?0.447714054 ? ?0.000479322 > hsa-let-7a--SMG7 ? ?0.444972282 ? ?0.000524129 > Suppose your data is "d": then try which(d[,3] < 0.05) > 2. I got the p.adjust.R from R source. In the method "BH", I am not clear > with the code: > ? ? ? ? ? i <- lp:1L # Just the same as seq(lp, 1 , by = -1) > ? ? ? ? ? o <- order(p, decreasing = TRUE) > ? ? ? ? ? ro <- order(o) > ? ? ? ? ? pmin(1, cummin( n / i * p[o] ))[ro] # pmin does parallel minimums, p[o] is the same as sort(p) and ordering by [ro] puts the outputted values in reverse order than the went in. As an exercise, I'd suggest you get the original paper, see how the calculation is done there, implement it in R as best you can, even if it seems loop-y, and refine it down to R Core's implementation. One of the best ways I know to learn to think vectorwise. Sorry I can't help more, but I don't know the method so I dont want to read too much into the code and say something that I havent thought through (Lord knows I do that enough on this list!!) Michael > > How to explain the first and the fourth row. > ====================p.adjust.R======================================= > p.adjust.methods <- > ? ?c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none") > p.adjust <- function(p, method = p.adjust.methods, n = length(p)) > { > ? ?## Methods 'Hommel', 'BH', 'BY' and speed improvements contributed by > ? ?## Gordon Smyth . > ? ?method <- match.arg(method) > ? ?if(method == "fdr") method <- "BH" # back compatibility > ? ?nm <- names(p) > ? ?p <- as.numeric(p); names(p) <- nm > ? ?p0 <- p > ? ?if(all(nna <- !is.na(p))) nna <- TRUE > ? ?p <- p[nna] > ? ?lp <- length(p) > ? ?stopifnot(n >= lp) > ? ?if (n <= 1) return(p0) > ? ?if (n == 2 && method == "hommel") method <- "hochberg" > ? ?p0[nna] <- > ?switch(method, > ? ? ? ?bonferroni = pmin(1, n * p), > ? ? ? ?holm = { > ? ? i <- seq_len(lp) > ? ? o <- order(p) > ? ? ro <- order(o) > ? ? pmin(1, cummax( (n - i + 1L) * p[o] ))[ro] > ? ? ? ?}, > ? ? ? ?hommel = { ## needs n-1 >= 2 in for() below > ? ? if(n > lp) p <- c(p, rep.int(1, n-lp)) > ? ? i <- seq_len(n) > ? ? o <- order(p) > ? ? p <- p[o] > ? ? ro <- order(o) > ? ? q <- pa <- rep.int( min(n*p/i), n) > ? ? for (j in (n-1):2) { > ? ? ? ? ij <- seq_len(n-j+1) > ? ? ? ? i2 <- (n-j+2):n > ? ? ? ? q1 <- min(j*p[i2]/(2:j)) > ? ? ? ? q[ij] <- pmin(j*p[ij], q1) > ? ? ? ? q[i2] <- q[n-j+1] > ? ? ? ? pa <- pmax(pa,q) > ? ? } > ? ? pmax(pa,p)[if(lp < n) ro[1:lp] else ro] > ? ? ? ?}, > ? ? ? ?hochberg = { > ? ? i <- lp:1L > ? ? o <- order(p, decreasing = TRUE) > ? ? ro <- order(o) > ? ? pmin(1, cummin( (n - i + 1L) * p[o] ))[ro] > ? ? ? ?}, > ? ? ? ?BH = { > ? ? i <- lp:1L > ? ? o <- order(p, decreasing = TRUE) > ? ? ro <- order(o) > ? ? pmin(1, cummin( n / i * p[o] ))[ro] > ? ? ? ?}, > ? ? ? ?BY = { > ? ? i <- lp:1L > ? ? o <- order(p, decreasing = TRUE) > ? ? ro <- order(o) > ? ? q <- sum(1L/(1L:n)) > ? ? pmin(1, cummin(q * n / i * p[o]))[ro] > ? ? ? ?}, > ? ? ? ?none = p) > ? ?p0 > } > ============================================================ > > > I wrote a code to do my work in BH correction like the following: > > rm(list=ls()) > a<-read.csv("test.txt",sep="\t",header=F,quote="") > b<-a[order(a[,3],decreasing=TRUE),] > c<-p.adjust(b[,3],method="BH") > b[,4]<-c > write.table(b,"zz.txt",sep="\t") > > Is that right? Thanks for all. > > Jiang > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From rshepard at appl-ecosys.com Tue Oct 4 21:40:51 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Tue, 4 Oct 2011 12:40:51 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Tue, 4 Oct 2011, Sarah Goslee wrote: > You asked for pointers, and didn't provide a reproducible example, so I > offered a pointer. Sarah, I did not realize that your pointer was to the factor component of the subset() command. I think the most parsimonious thing for me to do is to modify the database table with a new column of the full stream name, then re-export and re-read into R. Thanks, Rich From rshepard at appl-ecosys.com Tue Oct 4 21:43:20 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Tue, 4 Oct 2011 12:43:20 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Tue, 4 Oct 2011, R. Michael Weylandt wrote: > No, that was just a typo on my end: > the correct order of arguments should have been > ff <- grepl("BC-", ff) Michael, Thank you. Rich From jbustosmelo at yahoo.es Tue Oct 4 21:48:13 2011 From: jbustosmelo at yahoo.es (Jose Bustos Melo) Date: Tue, 4 Oct 2011 20:48:13 +0100 (BST) Subject: [R] joining tables Message-ID: <1317757693.33349.YahooMailNeo@web26506.mail.ukl.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: no disponible URL: From michael.weylandt at gmail.com Tue Oct 4 21:59:20 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Tue, 4 Oct 2011 15:59:20 -0400 Subject: [R] distance coefficient for amatrix with ngative valus In-Reply-To: <4E8AA27A.5050104@xtra.co.nz> References: <1317660942.97644.YahooMailNeo@web65917.mail.ac4.yahoo.com> <1317691425.56813.YahooMailNeo@web65904.mail.ac4.yahoo.com> <1317698873.20366.YahooMailNeo@web65906.mail.ac4.yahoo.com> <4E8AA27A.5050104@xtra.co.nz> Message-ID: You are, of course, entirely correct and, once again, I tip my hat to the erudition of those who comment on this list. My initial formulation, for a distance on a normed space inherited from the norm, stands trivially, but as you rightly point out, I'm excluding many interesting and possibly useful norms. Follies of youth and all that.... Michael On Tue, Oct 4, 2011 at 2:06 AM, Rolf Turner wrote: > On 04/10/11 17:05, R. Michael Weylandt wrote: > > >> >> More importantly, as I said in my initial response, any distance >> metric worth its salt is translation invariant. > > > > Point of order, Mr. Chairman. ?(This is really *toadally* off topic; > my apologies, but I couldn't resist --- I trained as a pure mathematician). > > A *metric* need not in general be translation invariant. ?Indeed a metric > need not be defined on a space in which translation makes any sense. > > A metric defined in terms of a *norm* (on a normed vector space) ?by > rho(x,y) = ||x - y|| is of course by definition translation invariant, and > that's > what most of us think in terms of. > > But there are perfectly ``reasonable'' ?metrics, defined on vector spaces, > which are not translation invariant. ?Whether these are ``worth their salt'' > is I suppose a matter of taste. ?(You should pardon the expression. :-) ) > > A simple e.g. of a non-translation-invariant metric is > > ? ?rho(x,y) = |x - y|/(1 + |x| + |y|) > > (defined on the real line). ?It is easily checked that rho(.,.) satisfies > the > four conditions that a metric must satisfy. ?(Exercise for the interested > reader.) > > Note that rho(1,2) = 1/4 ?but rho(2,3) = 1/6, ergo not translation > invariant. > > ? ?cheers, > > ? ? ? ?Rolf Turner > From michael.weylandt at gmail.com Tue Oct 4 22:01:55 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Tue, 4 Oct 2011 16:01:55 -0400 Subject: [R] joining tables In-Reply-To: <1317757693.33349.YahooMailNeo@web26506.mail.ukl.yahoo.com> References: <1317757693.33349.YahooMailNeo@web26506.mail.ukl.yahoo.com> Message-ID: Perhaps rbind? Michael On Tue, Oct 4, 2011 at 3:48 PM, Jose Bustos Melo wrote: > Hello everyone, > > I know this is very basic question for you people. I'm working with mani diferent tables, but everyone has the same variables. (V1, V2, V3). The only think that I need to do is to put together this tables. In other words, creating just one big table with all the cases showed in the smaller tables. > For example: > > tabla1<-data.frame(v1,v2,v3) > tabla2<-data.frame(v1,v2,v3) > tabla3<-data.frame(v1,v2,v3) > tabla4<-data.frame(v1,v2,v3) > > Just want to join it together in just one table. By the way, are more that 3 Millon cases. > Thank you in advance! > Jos? > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From gavin.simpson at ucl.ac.uk Tue Oct 4 22:36:25 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Tue, 04 Oct 2011 21:36:25 +0100 Subject: [R] Adonis and nmds help and questions for a novice. In-Reply-To: References: Message-ID: <1317760585.2764.19.camel@chrysothemis.geog.ucl.ac.uk> On Tue, 2011-10-04 at 08:45 +0000, Ashley Houlden wrote: > Hi, > > forgive me if someone has already posted about this but I have had a > look and cannot find the answer, also I am very new to R and been > getting the grips with this. > > I have been trying to use Adonis to find out if there are significant > difference between groups on data that I have analyses with NMDS, and > have been struggling with getting this to work and understanding what > is going on. I am looking at diversity in different soils with either > woodland or grassland habitats. > > I have run the scripts > > library(vegan) > library(ecodist) > library(MASS) > mydata <- read.table("ash_data.csv", header=TRUE, sep=",", > row.names="Site") > > envdata_fit <- read.table("ash_env.csv", header=TRUE, sep=",", > row.names="Site") > > #distance matrix of samples using bray curtis > d= bcdist(mydata, rmzero=FALSE) > > And then using the distance matrix from this to use for adoins? Is > this correct. > > With this I have then run Adonis > > results = adonis(d ~ wood, envdata_fit, permutations = 1000) > > and get significant values to see if sig diff in diversity between > wood and grass habitat. > > However I have been reading about combining the variables, but there > seems to be different ways for example > > results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000) > > so get sig values for Wood and soil > > or > > results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000) > > And I get sig values for wood, soil, and wood soil interaction. > > This seems to make sense, however for both if I put the variable the > other way around (soil+wood or soil*wood) I get very different sig > values, even accounting for the fact they vary slightly due to the > permutations. So whats is going on and why to the the values change so > much? You can isolate the effects due to different permutations being used by setting a seed via set.seed(). As ?adonis says, sequential sums of squares are used. If there is imbalance in your design it isn't surprising that the results are not invariant to the ordering of terms in the formula. > I was also wondering in Adonis, can you nest treatments, so see effect > of soil removing the effect of woodland as you can with anova? Not 100% sure what you mean by nested, but adonis() uses the full functionality of R's formula interface. See The R manual for details or ?formula. ?adonis also has details of how you might test a nested design in the Details section - this might not be what you want but it does allow you to test for an effect of one variable by conditioning the permutations on another. > Another general questions as well, if I have more than two groups in a > treatment, say for soil, clay, sand, loam and do the stats, and I get > a significant value, what does it actually mean, is it that soil > generally has an effect, with each group separate, or there are > general differences between soils which may be one group is very > different to the other two? The permutation test, test at the level of the factor, not pairwise comparisons of the levels within the factor. So you get information on Soil, not on Clay, Sand, Loam levels. This is the same as you would get if you did anova(mod) where mod was a linear model with a factor predictor. betadisper() the sister function to adonis() which tests for differences of multivariate dispersions, not differences of multivariate means, does allow the sorts of pairwise tests you are thinking of, but we haven't implemented this in adonis yet I'm afraid. HTH G > Many many thanks to anyone who can help me as I have asked people who > use R near me and no-one is sure and uses Adonis.. > > Ash > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From gavin.simpson at ucl.ac.uk Tue Oct 4 22:41:06 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Tue, 04 Oct 2011 21:41:06 +0100 Subject: [R] Adonis and nmds help and questions for a novice. In-Reply-To: References: Message-ID: <1317760866.2764.21.camel@chrysothemis.geog.ucl.ac.uk> On Tue, 2011-10-04 at 08:45 +0000, Ashley Houlden wrote: > Hi, > #distance matrix of samples using bray curtis > d= bcdist(mydata, rmzero=FALSE) In addition, you don't necessarily need ecodist for the bray curtis distance. vegdist() in vegan will compute this for you. Not that there is anything wrong with ecodist I hasten to add - just that you can do this all in vegan if you wanted. G > And then using the distance matrix from this to use for adoins? Is this correct. > > With this I have then run Adonis > > results = adonis(d ~ wood, envdata_fit, permutations = 1000) > > and get significant values to see if sig diff in diversity between wood and grass habitat. > > However I have been reading about combining the variables, but there seems to be different ways for example > > results = adonis(d ~ wood+soil, envdata_fit, permutations = 1000) > > so get sig values for Wood and soil > > or > > results = adonis(d ~ wood*soil, envdata_fit, permutations = 1000) > > And I get sig values for wood, soil, and wood soil interaction. > > This seems to make sense, however for both if I put the variable the other way around (soil+wood or soil*wood) I get very different sig values, even accounting for the fact they vary slightly due to the permutations. So whats is going on and why to the the values change so much? > > I was also wondering in Adonis, can you nest treatments, so see effect of soil removing the effect of woodland as you can with anova? > > Another general questions as well, if I have more than two groups in a treatment, say for soil, clay, sand, loam and do the stats, and I get a significant value, what does it actually mean, is it that soil generally has an effect, with each group separate, or there are general differences between soils which may be one group is very different to the other two? > > Many many thanks to anyone who can help me as I have asked people who use R near me and no-one is sure and uses Adonis.. > > Ash > ________________________________ > > No virus found in this message. > Checked by AVG - www.avg.com > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From d.scott at auckland.ac.nz Tue Oct 4 22:47:56 2011 From: d.scott at auckland.ac.nz (David Scott) Date: Wed, 05 Oct 2011 09:47:56 +1300 Subject: [R] Tinn-R In-Reply-To: <009201cc82c3$04424fb0$7800a8c0@ATRPDC.ATRColumbia.Com> References: <009201cc82c3$04424fb0$7800a8c0@ATRPDC.ATRColumbia.Com> Message-ID: <4E8B70FC.6060408@auckland.ac.nz> On 5/10/2011 7:25 a.m., Charles McClure wrote: > I am new to R and have recently tried Tinn-R with very mixed and unexpected > results. Can you point me to a Tinn-R tutorial on the web or a decent > reference book? > > Thank you for your help; > > Charles McClure > cmcclure at atrcorp.com > cfmcclure at verizon.net > There is a free eBook on tinn-R available from Rmetrics: https://www.rmetrics.org/ebooks-tinnr Written by the authors of tinn-R. Please consider a donation to the Rmetrics Association. -- _________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.scott at auckland.ac.nz, Fax: +64 9 373 7018 From sarah.goslee at gmail.com Tue Oct 4 22:55:00 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Tue, 4 Oct 2011 16:55:00 -0400 Subject: [R] Adonis and nmds help and questions for a novice. In-Reply-To: <1317760866.2764.21.camel@chrysothemis.geog.ucl.ac.uk> References: <1317760866.2764.21.camel@chrysothemis.geog.ucl.ac.uk> Message-ID: On Tue, Oct 4, 2011 at 4:41 PM, Gavin Simpson wrote: > On Tue, 2011-10-04 at 08:45 +0000, Ashley Houlden wrote: >> Hi, > >> #distance matrix of samples using bray curtis >> d= bcdist(mydata, rmzero=FALSE) > > In addition, you don't necessarily need ecodist for the bray curtis > distance. vegdist() in vegan will compute this for you. > > Not that there is anything wrong with ecodist I hasten to add - just > that you can do this all in vegan if you wanted. Because ecodist is awesome. :) But there's no need to mix and match; vegan and ecodist do many of the same things (for historical reasons). Sarah -- Sarah Goslee http://www.functionaldiversity.org From bsmith030465 at gmail.com Tue Oct 4 22:56:37 2011 From: bsmith030465 at gmail.com (Brian Smith) Date: Tue, 4 Oct 2011 16:56:37 -0400 Subject: [R] ggplot2: changing default colors of boxplot Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From camelbbs at gmail.com Tue Oct 4 23:15:29 2011 From: camelbbs at gmail.com (chunjiang he) Date: Tue, 4 Oct 2011 16:15:29 -0500 Subject: [R] a question about sort and BH In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From diggsb at ohsu.edu Tue Oct 4 23:15:33 2011 From: diggsb at ohsu.edu (Brian Diggs) Date: Tue, 4 Oct 2011 14:15:33 -0700 Subject: [R] inconsistent behavior of summary function In-Reply-To: References: <5C294FB9-8384-4996-87D4-74CC0A40AA78@gmail.com> <1317711522584-3870106.post@n4.nabble.com> <4E8AC17F.2020200@xtra.co.nz> Message-ID: <4E8B7775.50709@ohsu.edu> I'm going to put on my fire suit and wade in (see inline) On 10/4/2011 8:11 AM, Bert Gunter wrote: > On Tue, Oct 4, 2011 at 7:42 AM, Jeanne M. Spicerwrote: > >> I'm not sure how returning an incorrect result is ever a 'positive' feature > > It is **not** "incorrect"; perhaps unexpected, but that is not the same. > "You are technically correct -- the best kind of correct" -- Futurama The results (using the built-in data set rock) > summary(rock["area"]) area Min. : 1016 1st Qu.: 5305 Median : 7487 Mean : 7188 3rd Qu.: 8870 Max. :12212 > summary(rock[["area"]]) Min. 1st Qu. Median Mean 3rd Qu. Max. 1016 5305 7487 7188 8870 12210 differ for exactly the reason you say (dispatching to different methods of summary), and the different values of max are both correct given the documentation. However, let's walk through what it takes to show that. In the help page for summary, an option digits is described, which has the default value max(3, getOption("digits")-3). Executing this (or getOption("digits") alone and doing the math) results in the default value of digits being 4 (at least for me; and I do not believe that I have changed the option). So what is this option used for? In the documentation, it says: "integer, used for number formatting with signif() (for summary.default) or format() (for summary.data.frame)." Let's assume that we realize that rock["area"] is a data frame, which would be handled by summary.data.frame, and rock[["area"]] is a vector, and further determine that summary.default is what will handle it (having not found summary.vector or summary.integer). Let's dive into the help page for signif and format, since they are listed as relevant to the use of digits in the two different cases. signif tells us that digits is "integer indicating the number of ... significant digits (signif) to be used." Looking at "Details", the last sentence says "Each element of the vector is rounded individually, unlike printing." So in the case of a vector, each value is separately rounded to 4 significant digits (max of 12212 is rounded to 12210) format tells us that digits is "how many significant digits are to be used for numeric and complex x. ... This is a suggestion: enough decimal places will be used so that the smallest (in magnitude) number has this many significant digits, and also to satisfy nsmall." So the difference is that if it is a vector, each part (min, quartiles, mean, and max) is rounded to 4 significant digits individually, while if it is a column of a data frame, the set is collectively rounded so that the smallest has 4 significant digits and the rest are carried out to the same decimal place. Some points: 1) Both of these functions are in base, so I would expect the same behavior using the same (default) arguments. Yes, the key word is "expect." Hopefully I have demonstrated that I understand why they differ. I would not anticipate rounding, and when only one value has only one digit rounded, it is not really obvious that it happened. (As compared to say, summary(11111*rock$area), if I knew the data was not all rounded to the nearest 10,000). So this is not just a matter of realizing that different methods are being dispatched, but reading through three different help pages (at least three, assuming I started at the right place and realized which other two were the relevant ones) to see that the end results are presented differently WHICH I WOULD NOT REALIZE THAT I EVEN NEED TO DO. 2) rock$area is an integer vector, so even if I realize that rounding would be done on floating point numbers, I would not expect (yes, again, "expect") that integers would need to be rounded to some lesser number of significant digits. 3) The documentation for summary is actually wrong about digits for the case of summary.data.frame. Consider: > summary(rock["area"], digits=17) area Min. : 1016.0000000000000 1st Qu.: 5305.2500000000000 Median : 7487.0000000000000 Mean : 7187.7291666700003 3rd Qu.: 8869.5000000000000 Max. :12212.0000000000000 In particular, note the mean. It is wrong (mathematically incorrect AND not consistent with the documentation). > dput(mean(rock["area"])) structure(7187.72916666667, .Names = "area") Why? Internally, summary.data.frame calls summary.default on rock[["area"]] with a hard coded digits value of 12. Then takes this value, and formats it with 17 digits of precision as requested. That's why there are the four zeros in the middle (the last digit being numerical imprecision due to binary representation of floating point values). 4) summary.default does not necessarily honor the number of significant digits either: > for(i in 1:9) print(summary(rock[["area"]], digits=i)) Min. 1st Qu. Median Mean 3rd Qu. Max. 1000 5000 7000 7000 9000 10000 Min. 1st Qu. Median Mean 3rd Qu. Max. 1000 5300 7500 7200 8900 12000 Min. 1st Qu. Median Mean 3rd Qu. Max. 1020 5310 7490 7190 8870 12200 Min. 1st Qu. Median Mean 3rd Qu. Max. 1016 5305 7487 7188 8870 12210 Min. 1st Qu. Median Mean 3rd Qu. Max. 1016.0 5305.2 7487.0 7187.7 8869.5 12212.0 Min. 1st Qu. Median Mean 3rd Qu. Max. 1016.00 5305.25 7487.00 7187.73 8869.50 12212.00 Min. 1st Qu. Median Mean 3rd Qu. Max. 1016.000 5305.250 7487.000 7187.729 8869.500 12212.000 Min. 1st Qu. Median Mean 3rd Qu. Max. 1016.000 5305.250 7487.000 7187.729 8869.500 12212.000 Min. 1st Qu. Median Mean 3rd Qu. Max. 1016.000 5305.250 7487.000 7187.729 8869.500 12212.000 Beyond 7, no additional significant digits are printed, despite the value of digits. This is the behavior of signif > signif(mean(rock[["area"]]), digits=9) [1] 7187.729 but is not consistent with documentation (which says digits can be as large as 22). >> but at least the documentation could more clearly warn users that this >> method behaves differently in these cases -- summary(rock[,1]) vs >> summary(rock[,1:2]) -- and that the method can and *does* return incorrect >> results without any warning messages. >> > > What is (in)adequate in documentation is often in the mind of the beholder. > > Note: >> class(rock[,1]) > [1] "integer" > >> class(rock[,1:2]) > [1] "data.frame" > > This means that different methods are dispatched, leading to the different > results. Morever, >> summary(rock[,1,drop=FALSE]) > area > Min. : 1016 > 1st Qu.: 5305 > Median : 7487 > Mean : 7188 > 3rd Qu.: 8870 > Max. :12212 > > ... and that is because >> class(rock[,1,drop=FALSE]) > [1] "data.frame" > > So the relevant Help file is ?"[.data.frame" That certainly explains the reasoning for the different dispatches, but is only the start of understanding what is going on. The data frame method does rather what you would expect (since format tends to be less surprising from an output point of view). Consider another example: > summary(11111*rock["area"]) area Min. : 11288776 1st Qu.: 58946633 Median : 83188057 Mean : 79862859 3rd Qu.: 98549015 Max. :135687532 > summary(11111*rock[["area"]]) Min. 1st Qu. Median Mean 3rd Qu. Max. 11290000 58950000 83190000 79860000 98550000 135700000 Both of these have digits value of 4 (the default), but the data frame one "ignores" it (or, more accurately, format takes it as a recommendation but prints all values down to the 1's place despite only 4 significant digits being requested, probably due to nsmall being 0). The default method dutifully rounds each value to the requested default 4 decimal places. >> I would encourage anyone teaching introductory R to look at the 'epicalc' >> package. The re-vamped function 'summ' in that package returns correct >> results regardless - summ(rock), summ(rock$area). In addition, when you >> only ask for one column you not only get the correct results, you also get a >> bonus distribution plot. >> >> I'd would like all of our students to use R, but little things like this >> are huge stumbling blocks for them. >> > > I have no doubt that this is true. R is powerful, flexible and, as an > inevitable result, complex. To master it, honest effort is required, > probably a somewhat scarce commodity in introductory classes, especially for > non-statisticians. For that reason, there are numerous learning resources > available, to be found on CRAN. Have you looked at them? Moreover,there are > several R GUI's that attempt to shield the beginner from the initial shock, > to be found in the R-GUIs link under "Other Projects." Have you considered > those? > > So I think something more than righteous indignation is called for here. > Nevertheless, the bottom line is that you get what you pay for: R **IS** > hard -- but for many serious data analysts of all stripes, worth the effort. I saw it as more exasperation at inconsistencies rather than righteous indignation. There is much power in R, and there are many subtle points (to which the existence of the R Inferno attests). Certainly the more complicated a task is undertaken, the more subtleties are to be expected. But to have to track subtle rounding issues for a simple summary of a set of numbers (depending on how exactly the summary is requested) was where I thought the frustration was coming from. > Cheers, > Bert > >> -jeanne >> -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University From jricci at corcare.net Tue Oct 4 22:28:10 2011 From: jricci at corcare.net (jricci) Date: Tue, 4 Oct 2011 13:28:10 -0700 (PDT) Subject: [R] how do i put two scatterplots on same graph In-Reply-To: <38FDA808-D79C-45DA-8F65-EB0651CB8663@revelle.net> References: <1317709183813-3870030.post@n4.nabble.com> <4E8ACFDF.9070108@knmi.nl> <38FDA808-D79C-45DA-8F65-EB0651CB8663@revelle.net> Message-ID: <509B10C9D20E7041BE0D85FB1C7F8D64105CCB4D6F@cc-mail.corcare.priv> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Tue Oct 4 23:34:28 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 4 Oct 2011 14:34:28 -0700 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: References: <4E89EFE3.5050804@noaa.gov> Message-ID: Hi Hadley: When I tried your function on the example data, I got the following: dd <- data.frame(year = rep(2000:2008, each = 500), y = rnorm(4500)) g <- function(df, qs = c(.05, .25, .50, .75, .95)) { data.frame(q = qs, quantile(d$y, qs)) } ddply(dd, .(year), g) > ddply(dd, .(year), g) year q quantile.d.y..qs. 1 2000 0.05 NA 2 2000 0.25 NA 3 2000 0.50 NA ... 43 2008 0.50 NA 44 2008 0.75 NA 45 2008 0.95 NA Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' This, however, does work (with a likely fix to the variable name afterwards): g <- function(df, qs = c(.05, .25, .50, .75, .95)) { data.frame(q = qs, quantile(d[, 2], qs)) } > ddply(dd, .(year), g) year q quantile.d...2...qs. 1 2000 0.05 -1.36670724 2 2000 0.25 -0.97786897 3 2000 0.50 -0.05982217 4 2000 0.75 0.33576399 5 2000 0.95 1.30389105 ... Dennis On Tue, Oct 4, 2011 at 12:10 PM, Hadley Wickham wrote: >> # Function to compute quantiles and return a data frame >> g <- function(d) { >> ? qq <- as.data.frame(as.list(quantile(d$y, c(.05, .25, .50, .75, .95)))) >> ? names(qq) <- paste('Q', c(5, 25, 50, 75, 95), sep = '') >> ? qq ? } > > You could cut out the melt step by making this return a data frame: > > g <- function(df, qs = c(.05, .25, .50, .75, .95)) { > ?data.frame(q = qs, quantile(d$y, qs)) > } > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > From Thomas.Adams at noaa.gov Tue Oct 4 23:38:14 2011 From: Thomas.Adams at noaa.gov (Thomas Adams) Date: Tue, 04 Oct 2011 17:38:14 -0400 Subject: [R] Question about ggplot2 and stat_smooth In-Reply-To: References: <4E89EFE3.5050804@noaa.gov> Message-ID: <4E8B7CC6.5070307@noaa.gov> Hadley: Below is an example of what I am trying to do, I just don't understand how to supply the limits to the blue and pink shaded regions and the values of the black line, which are meant to represent from bottom to top, the 5%, 25%, 50%, 75%, 95% limits that I get from quantile(): h + geom_ribbon(aes(ymin=level-2, ymax=level+2),fill='pink')+ geom_ribbon(aes(ymin=level-1, ymax=level+1),fill='light blue')+ geom_line(aes(y=level)) My apologies for not explaining what I was after better previously. Regards, Tom On 10/4/11 1:01 PM, Thomas.Adams at noaa.gov wrote: > Hadley, > > Thanks for responding. No, not smoothed quantile regression. If you go here: http://www.erh.noaa.gov/mmefs/index.php and click on one of the colored squares, you can see we have 'boxplots'. What I want to express is the uncertainty as depicted in the example from my previous email where I can specify the limits calculated for the 'boxplots' using 5%, 25%,75%, 95% limits as we have with the 'boxplots'. > > Tom > > ----- Original Message ----- > From: Hadley Wickham > Date: Tuesday, October 4, 2011 10:23 am > Subject: Re: [R] Question about ggplot2 and stat_smooth > To: Thomas Adams > Cc: R-help forum > > >> On Mon, Oct 3, 2011 at 12:24 PM, Thomas Adams >> wrote: >>> I'm interested in creating a graphic -like- this: >>> >>> c<- ggplot(mtcars, aes(qsec, wt)) >>> c + geom_point() + stat_smooth(fill="blue", colour="darkblue", >> size=2, alpha >>> = 0.2) >>> >>> but I need to show 2 sets of bands (with different shading) using >> 5%, 25%, >>> 75%, 95% limits that I specify and where the heavy blue line is the >> median. >>> I don't understand how to do this with ggplot2. >> Exactly what sort of limits do you want? It sounds like maybe you are >> looking for smoothed quantile regression. >> >> Hadley >> >> -- >> Assistant Professor / Dobelman Family Junior Chair >> Department of Statistics / Rice University >> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Thomas E Adams National Weather Service Ohio River Forecast Center 1901 South State Route 134 Wilmington, OH 45177 EMAIL: thomas.adams at noaa.gov VOICE: 937-383-0528 FAX: 937-383-0033 From angerusso1980 at gmail.com Tue Oct 4 23:44:54 2011 From: angerusso1980 at gmail.com (Angel Russo) Date: Tue, 4 Oct 2011 17:44:54 -0400 Subject: [R] Assigning genes to CBS segmented output: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From carl at witthoft.com Wed Oct 5 00:09:55 2011 From: carl at witthoft.com (Carl Witthoft) Date: Tue, 04 Oct 2011 18:09:55 -0400 Subject: [R] How many pixels or steps in rasterImage interpolation? Message-ID: <4E8B8433.3020502@witthoft.com> Hi, I'm looking at a very simple picture created with rasterImage. foo <- matrix(1:9,3,3) foo[,] <- rainbow(9) plot(0:1,0:1,t='n') rasterImage(foo,0,0,1,1) If I choose to specify interpolate=F, I get the expected 9 blocks of color. My question is: how many values (aka pixels) does rasterImage use as it interpolates from one input cell to the next? And I suppose I should ask if there's a way to control the size of the interpolation region as well. I do recognize that rasterImage doesn't return anything, so it's not as though I'm trying to save a matrix of the interpolated pixel values. I just would like to get a feel for what's being done to my source data. thanks Carl -- ----- Sent from my Cray XK6 From emammendes at gmail.com Wed Oct 5 00:22:33 2011 From: emammendes at gmail.com (Eduardo M. A. M.Mendes) Date: Tue, 4 Oct 2011 19:22:33 -0300 Subject: [R] Problems loading package hydroTSM Message-ID: <036801cc82e4$1ef26c20$5cd74460$@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From heverkuhn at gmail.com Wed Oct 5 00:52:22 2011 From: heverkuhn at gmail.com (Heverkuhn Heverkuhn) Date: Tue, 4 Oct 2011 17:52:22 -0500 Subject: [R] break.axis all range of data Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From NordlDJ at dshs.wa.gov Wed Oct 5 01:05:57 2011 From: NordlDJ at dshs.wa.gov (Nordlund, Dan (DSHS/RDA)) Date: Tue, 4 Oct 2011 16:05:57 -0700 Subject: [R] how do i put two scatterplots on same graph In-Reply-To: <509B10C9D20E7041BE0D85FB1C7F8D64105CCB4D6F@cc-mail.corcare.priv> References: <1317709183813-3870030.post@n4.nabble.com> <4E8ACFDF.9070108@knmi.nl> <38FDA808-D79C-45DA-8F65-EB0651CB8663@revelle.net> <509B10C9D20E7041BE0D85FB1C7F8D64105CCB4D6F@cc-mail.corcare.priv> Message-ID: <941871A13165C2418EC144ACB212BDB002123C14@dshsmxoly1504g.dshs.wa.lcl> > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of jricci > Sent: Tuesday, October 04, 2011 1:28 PM > To: r-help at r-project.org > Subject: Re: [R] how do i put two scatterplots on same graph > > I am new at this. The two data sets don't have color variable just > paired data. How should I structure the data sets in R? > Joe Ricci > > ________________________________ > From: William Revelle [via R] node+s789695n3871355h20 at n4.nabble.com> > To: Joe Ricci > Sent: Tue Oct 04 11:18:20 2011 > Subject: Re: how do i put two scatterplots on same graph > > > > If the data are from one data.frame (e.g., the iris data set), then > simply label the red and white flowers with different colors: > e.g., > > with the iris data set > > plot(iris$Sepal.Length,iris$Sepal.Width,col=c("red","blue","black")[iri > s$Species],pch=c(16:18)[iris$Species]) > > Bill > > > > > On Oct 4, 2011, at 4:20 AM, Paul Hiemstra wrote: > > > On 10/04/2011 06:19 AM, jricci wrote: > >> Have two sets of scatterplot data > >> hypothetically > >> a) stem lenght vs number of petals in red flowers > >> b) stem lenght vs number of petals in white flowers > >> > >> want to place on same scatter plot with same x,y axis but different > collored > >> markers > >> > >> How do I do this in R > >> > >> -- > >> View this message in context: http://r.789695.n4.nabble.com/how-do- > i-put-two-scatterplots-on-same-graph-tp3870030p3870030.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> [hidden email] > mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > Hi, > > > > You could take a look at the ggplot2 package. > > > > good luck, > > Paul > > > > -- > > Paul Hiemstra, Ph.D. > > Global Climate Division > > Royal Netherlands Meteorological Institute (KNMI) > > Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 > > P.O. Box 201 | 3730 AE | De Bilt > > tel: +31 30 2206 494 > > > > http://intamap.geo.uu.nl/~paul > > http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 > > > > ______________________________________________ > > [hidden email] > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > William Revelle http://personality-project.org/revelle.html > Professor http://personality-project.org > Department of Psychology http://www.wcas.northwestern.edu/psych/ > Northwestern University http://www.northwestern.edu/ > Use R for psychology http://personality-project.org/r > It is 6 minutes to midnight http://www.thebulletin.org > > ______________________________________________ > [hidden email] mailing > list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. > > This where it is important to follow the posting (see note immediately above about self-contained examples). But if you have two data frames, you can plot one and then use the points() function to plot the data from the other on the same graph. Something like ##create some data red_flowers <- data.frame(stem.len=sample(7:15,25,replace=TRUE), num.petals=sample(35:55,25,replace=TRUE)) white_flowers <- data.frame(stem.len=sample(5:12,25,replace=TRUE), num.petals=sample(45:85,25,replace=TRUE)) ##plot the red flowers plot(red_flowers$stem.len, red_flowers$num.petals, col='red', xlim=c(5,15), ylim=c(35,85)) ##use points() to plot the white flowers points(white_flowers$stem.len, white_flowers$num.petals) You will need to make sure you set the x and y axis limits so as not to truncate values in either data frame. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 From nfaux at unimelb.edu.au Wed Oct 5 01:14:31 2011 From: nfaux at unimelb.edu.au (Noel Faux) Date: Wed, 5 Oct 2011 10:14:31 +1100 Subject: [R] ggplot2: not displaying annotation (label = expression) in/on graph Message-ID: <4300C8F5-8866-4F19-9E44-514553DEC051@unimelb.edu.au> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Wed Oct 5 01:28:44 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 4 Oct 2011 16:28:44 -0700 Subject: [R] F-values in nested designs In-Reply-To: References: Message-ID: Hi: > INB4: if I have a nested design with treatment A and treatment B > within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I > make R give these values directly, without further coding? This is how to get an equivalent model in lme4, but it probably isn't what you expect (particularly the 'without further coding' part). Using your example, library('lme4') dn <- data.frame( y = c(10, 12, 8, 13, 14, 8, 10, 12, 9, 10, 12, 11, 11, 13, 9, 10, 14, 11, 10, 9, 8, 9, 8, 8, 13, 14, 7, 10, 10, 13, 9, 7, 16, 12, 5, 4), areas = factor(rep(c("m1", "m2", "m3"), each=12)), sites = factor(rep(1:4, 9))) > a <- lmer(y ~ areas + (1 | areas:sites), data = dn) > a Linear mixed model fit by REML Formula: y ~ areas + (1 | areas:sites) Data: dn AIC BIC logLik deviance REMLdev 171.1 179 -80.56 167 161.1 Random effects: Groups Name Variance Std.Dev. areas:sites (Intercept) 3.25 1.8028 Residual 4.50 2.1213 Number of obs: 36, groups: areas:sites, 12 Fixed effects: Estimate Std. Error t value (Intercept) 10.750 1.090 9.865 areasm2 -0.750 1.541 -0.487 areasm3 -0.750 1.541 -0.487 Correlation of Fixed Effects: (Intr) aresm2 areasm2 -0.707 areasm3 -0.707 0.500 ##------ lme4 reports the estimated variance components and their square roots, the standard deviation components (Std.Dev). The estimated residual variance component is 4.5, which is the same as the residual MSE from the Minitab output. The estimated variance component associated with sites nested within areas (areas:sites) is 3.25. Since the design is balanced, the expected mean square of this term (assuming the model assumptions are correct) is $\sigma_e^2 + 3 \sigma_s^2$, which is estimated by 4.5 + 3(3.25) = 14.25, the observed mean square for sites within areas, again coinciding with the Minitab output. However, lmer() does not report the result of an F-test for the 'significance' of the sites variance component, because the null hypothesis $\sigma_s^2 = 0$ is on the boundary of the parameter space and there are questions about the reliability of p-values for such tests. See http://rwiki.sciviews.org/doku.php?id=guides:lmer-tests In other words, don't accept the reported p-value re the sites variance from Minitab on faith. This answers (even in multi-stratum models in aov() ) > 2) why I don't have an F-value for the nested effect? I realize that R > call it as Residuals in the first part of the summary, but there is a > way to make R consider it s another factor? To get the fixed effects part of Minitab's ANOVA table with lmer(), > anova(a) Analysis of Variance Table Df Sum Sq Mean Sq F value areas 2 1.4208 0.71039 0.1579 Once again, the p-value is not reported (by design). Assuming that the specified normal-theory based model is correct, the conventional F test for testing the null hypothesis of equal area means would be the mean square ratio of areas to sites, which would have an F(2, 9) distribution under the null hypothesis. The p-value of that test would be > 1 - pf(0.1579, 2, 9) [1] 0.85625 Apart from the needless test of the sites within areas variance component, the lmer() output corresponds to that of the Minitab table. The output from lmer() gives you the capacity to do much more, but it helps to understand some of the theory behind mixed models first. The transition from fixed effects ANOVA to random effects and mixed models is not a smooth one - multiple sources of random variation complicate both testing and confidence/prediction interval procedures, with several messages on R-sig-mixed-models (including the one cited above) discussing such issues at length. As I said, this is probably not what you expected. Dennis On Tue, Oct 4, 2011 at 11:17 AM, Marcus Nunes wrote: > Hello all > > I'm trying to learn how to fit a nested model in R. I found a toy > example on internet where a dataset that have?3 areas and 4 sites > within these areas. When I use Minitab to fit a nested model to this > data, this is the ANOVA table that I got: > > Nested ANOVA: y versus areas, sites > > Analysis of Variance for y > Source ?DF ? ? ? ?SS ? ? ? MS ? ? ?F ? ? ?P > areas ? ?2 ? ?4.5000 ? 2.2500 ?0.158 ?0.856 > sites ? ?9 ?128.2500 ?14.2500 ?3.167 ?0.012 > Error ? 24 ?108.0000 ? 4.5000 > Total ? 35 ?240.7500 > > When I use R, this is the ANOVA table that I got: > > summary(aov(y ~ areas + Error(areas%in%sites))) > > Error: areas:sites > ? ? ? ? ?Df Sum Sq Mean Sq F value Pr(>F) > areas ? ? ?2 ? 4.50 ? ?2.25 ?0.1579 0.8563 > Residuals ?9 128.25 14.25 > > Error: Within > ? ? ? ? ?Df Sum Sq Mean Sq F value Pr(>F) > Residuals 24 ? ?108 ? ? 4.5 > Warning message: > In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular > > The results are the same, except for one F-value and I don't > understand why. Hence, these are my questions: > > 1) I searched google and I can't find a reason to have this warning in > my code. Why is this happening? > > 2) why I don't have an F-value for the nested effect? I realize that R > call it as Residuals in the first part of the summary, but there is a > way to make R consider it s another factor? > > INB4: if I have a nested design with treatment A and treatment B > within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I > make R give these values directly, without further coding? > > Thanks for your help. > > Below is my code and information about my system. > ---------------------- > y = c(10, 12, 8, 13, 14, 8, 10, 12, 9, 10, 12, 11, 11, 13, 9, 10, 14, > 11, 10, 9, 8, 9, 8, 8, 13, 14, 7, 10, 10, 13, 9, 7, 16, 12, 5, 4) > areas = as.factor(rep(c("m1", "m2", "m3"), each=12)) > #sites = as.factor(c(rep(c(1, 2, 3, 4), 3), rep(c(5, 6, 7, 8), 3), > rep(c(9, 10, 11, 12), 3))) > sites = as.factor(c(rep(c(1, 2, 3, 4), 9))) > repl ?= as.factor(rep(c(1, 2, 3), each=4, 3)) > > summary(aov(y ~ areas + Error(areas%in%sites))) > > summary(aov(y ~ areas + Error(areas%in%sites))) > Error: areas:sites > ? ? ? ? ? Df Sum Sq Mean Sq F value Pr(>F) > areas ? ? ?2 ? 4.50 ? ?2.25 ?0.1579 0.8563 > Residuals ?9 128.25 ? 14.25 > Error: Within > ? ? ? ? ? Df Sum Sq Mean Sq F value Pr(>F) > Residuals 24 ? ?108 ? ? 4.5 > Warning message: > In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular > > > > sessionInfo() > R version 2.13.1 Patched (2011-08-25 r56798) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] splines ? stats ? ? graphics ?grDevices utils ? ? datasets ?methods > [8] base > > other attached packages: > [1] car_2.0-11 ? ? ? ? survival_2.36-9 ? ?nnet_7.3-1 > [4] MASS_7.3-14 ? ? ? ?lme4_0.999375-40 ? Matrix_0.999375-50 > [7] lattice_0.19-33 ? ?nlme_3.1-102 > > loaded via a namespace (and not attached): > [1] grid_2.13.1 ? stats4_2.13.1 tools_2.13.1 > -- > Marcus Nunes > marcus.nunes at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From mtmorgan at fhcrc.org Wed Oct 5 01:35:40 2011 From: mtmorgan at fhcrc.org (Martin Morgan) Date: Tue, 04 Oct 2011 16:35:40 -0700 Subject: [R] Assigning genes to CBS segmented output: In-Reply-To: References: Message-ID: <4E8B984C.4010606@fhcrc.org> On 10/04/2011 02:44 PM, Angel Russo wrote: > Hi All, > > I have an CBS segmentation algorithm output for 10 tumor samples each from 2 > different tumors. > > Now, I am in an urgent need to assign gene (followed by all genes present) > that belong to a particular segment after I removed all the CNVs from > segment data. The format of the data is: > > Sample Chromosome Start End Num_Probes Segment_Mean > Sample1A-TA 1 51598 76187 15 -1.115 Hi Angel -- In Bioconductor http://bioconductor.org for some model organism create a data frame of known Entrez genes and their begin / end locations. Start by installing necessary software and data packages source('http://bioconductor.org/biocLite.R") biocLite(c('org.Hs.eg.db', "GenomicRanges')) then load the library with annotations about genic coordinates library(org.Hs.eg.db) anno = merge(toTable(org.Hs.egCHRLOC), toTable(org.Hs.egCHRLOCEND)) leading to > head(anno) gene_id Chromosome start_location end_location 1 10000 1 -243666483 -244006553 2 10000 1 -243666483 -244006553 3 10000 1 -243651534 -244006553 4 10000 1 -243651534 -244006553 5 100008586 X 49217770 49223847 6 100008586 X 49217770 49332715 For the simple question 'which genes are located on chromosome A starting at X and going to Y' you could subset(geno, Chromosome=="A" & abs(start_location) > X & abs(end_location) < Y) This could also be done through the 'biomaRt' package or GenomicFeatures / TxDb packages. To get this for many segments filter 'anno' to remove funky genes, e.g., those that have negative length(!) idx = with(anno, abs(start_location) > abs(end_location)) anno = anno[!idx,] manipulate this to a GRanges object; library(GenomicRanges) gr = with(anno, GRanges(Chromosome, IRanges(abs(start_location), abs(end_location)), names=gene_id)) convert your CBS result into a GRanges seg = with(CBS, GRanges(Chromosome, IRanges(Start, End))) then find overlaps olap = findOverlaps(gr, seg) the 'gr' is called the 'query', 'seg' is called the 'subject'. queryHits(olap) and subjectHits(olap) give equal-length vectors describing which queries overlap which subjects. You could group gene names by segment with split(names(gr)[queryHits(olap)], subjectHits(olap)) An important issue is to use the same genome build for annotations as you used for segmentation. Hope that helps / provides some hints for getting from A to B. Martin > > Could anyone suggest an R library or code or method that I can quickly use > to get the genes assigned to CBS output. > > Thanks so much, > Angel > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 From djmuser at gmail.com Wed Oct 5 01:41:58 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 4 Oct 2011 16:41:58 -0700 Subject: [R] ggplot2: changing default colors of boxplot In-Reply-To: References: Message-ID: Hi: Try this: p <- ggplot(mtcars, aes(factor(cyl), mpg)) p + geom_boxplot(aes(colour = factor(am)), fill = 'white') + scale_colour_manual('am', values = c('0' = 'blue', '1' = 'black')) HTH, Dennis On Tue, Oct 4, 2011 at 1:56 PM, Brian Smith wrote: > Hi, > > I wanted to change the default colors appearing in boxplot. For example, the > following code (from the package/documentation): > > =========== > library(ggplot2) > > p <- ggplot(mtcars, aes(factor(cyl), mpg)) > p + geom_boxplot(aes(fill = factor(am))) > > =========== > > Gives the default colors. What do I need to do to modify this so that: > > 1. Change the colors from green and red to blue and black > 2. Only have the outline of the boxplot colored (and not fill in the box) > > > thanks, > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From emammendes at gmail.com Wed Oct 5 02:07:32 2011 From: emammendes at gmail.com (Eduardo M. A. M. Mendes) Date: Tue, 4 Oct 2011 21:07:32 -0300 Subject: [R] Strange error msg when plotting a graphics Message-ID: Dear R-Users I have come across the error that apparently has nothing to do with command itself. Here is the error (w - matrix (or vector) e testXaxis - dates). > plot(data.frame(testXaxis,w),col="blue",ylab="Q, [m3/s]",xlab="Data", + main="Free-run - Modelo NARX MISO - Test Data") Error in gzfile(file, "wb") : cannot open the connection In addition: Warning message: In gzfile(file, "wb") : cannot open compressed file '/Users/eduardo/.rstudio-desktop/graphics/0540fa66-727a-4f8f-a1b5-f6501e96a393.snapshot', probable reason 'No such file or directory' Graphics error: Error in gzfile(file, "wb") : cannot open the connection The error did not stop the graphics of being plotted. What could be wrong? Many thanks Ed From rolf.turner at xtra.co.nz Wed Oct 5 03:09:29 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Wed, 05 Oct 2011 14:09:29 +1300 Subject: [R] Problem with .C In-Reply-To: <4E8AF632.1030803@mathematik.uni-marburg.de> References: <4D9AE45D.70205@mathematik.uni-marburg.de> <4E8AF632.1030803@mathematik.uni-marburg.de> Message-ID: <4E8BAE49.4070700@xtra.co.nz> On 05/10/11 01:04, Grigory Alexandrovich wrote: > Hello, > > I wrote a function in C, which works fine if called from the > main-function in C. > > But as soon as I try to call this function from R like .C('foo', > as.double(x), as.integer(y)), the programm crashes. > > I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and > loaded it into R with dyn.load(). > > What can be the cause of such behaviour? > Again, the C-funcion itself works, but not if called from R. It's impossible to say, with such minimal information, but a reasonable guess is that there is a problem with the declaration of "x" and "y" in foo.c. These would (I think) need to be declared as double *, not double, when foo is called from .C(). cheers, Rolf Turner From angerusso1980 at gmail.com Wed Oct 5 03:33:07 2011 From: angerusso1980 at gmail.com (Angel Russo) Date: Tue, 4 Oct 2011 21:33:07 -0400 Subject: [R] Assigning genes to CBS segmented output: In-Reply-To: <4E8B984C.4010606@fhcrc.org> References: <4E8B984C.4010606@fhcrc.org> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mailinglist.honeypot at gmail.com Wed Oct 5 04:30:42 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 4 Oct 2011 22:30:42 -0400 Subject: [R] Problem with .C In-Reply-To: <4E8AF632.1030803@mathematik.uni-marburg.de> References: <4D9AE45D.70205@mathematik.uni-marburg.de> <4E8AF632.1030803@mathematik.uni-marburg.de> Message-ID: Hi, As other have said, it's very difficult to help you without an example + code to know what you are talking about. That having been said, it seems as if you are just getting your feet wet in this R <--> C bridge, and I'd recommend you checkout the "Rcpp" and "inline" package to help make your life a lot easier ... -steve On Tue, Oct 4, 2011 at 8:04 AM, Grigory Alexandrovich wrote: > Hello, > > I wrote a function in C, which works fine if called from the main-function > in C. > > But as soon as I try to call this function from R like .C('foo', > as.double(x), as.integer(y)), the programm crashes. > > I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and loaded > it into R with dyn.load(). > > What can be the cause of such behaviour? > Again, the C-funcion itself works, but not if called from R. > > Thanks > Grigory Alexandrovich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From izahn at psych.rochester.edu Wed Oct 5 04:52:50 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Tue, 4 Oct 2011 22:52:50 -0400 Subject: [R] texi2dvi problem when compiling incorrect Latex code In-Reply-To: <1317732235196-3870827.post@n4.nabble.com> References: <1317732235196-3870827.post@n4.nabble.com> Message-ID: Hi Syrvn, On Tue, Oct 4, 2011 at 8:43 AM, syrvn wrote: > Hello, > > I am working on a big R project using Eclipse/StatET/Texlipse. I'd like to > write a Latex document within that project but DO NOT want to Sweave it. > It's pure Latex. Via the external tools configurations I set up 2 different > versions to ensure that my latex document is processed correctly. > > Version 1 (System Call): > library(tools) > setwd("${container_loc}") > file = "${resource_loc:${source_file_path}}" > try(system(paste("texi2pdf", shQuote(file)), intern=TRUE)) This "works" (i.e. the latex errors print to screen but eventually return me to my R session) on my system (see sessionInfo below), running R in the terminal. > > > Version 2 (R Call): > library(tools) > setwd("${container_loc}") > texi2dvi(file = "${resource_loc:${source_file_path}}", pdf = TRUE, quiet = > FALSE) This also works for me (see definition of "works" above) provided that I omit the "quiet = FALSE" part. R version 2.13.2 (2011-09-30) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base Best, Ista > > > Both versions work well as long as there is no error in my latex code. As > soon as there is an error > the process of texi2pdf / texi2dvi is not finished as the programme waits > for user input (mostly just press "enter" key). The problem is that R > outputs the output only after the whole programme finished so I always end > up having to kill my R console. > > Is there any workaround for that? > > Syrvn > > > > -- > View this message in context: http://r.789695.n4.nabble.com/texi2dvi-problem-when-compiling-incorrect-Latex-code-tp3870827p3870827.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From rhelpacc at gmail.com Wed Oct 5 05:41:06 2011 From: rhelpacc at gmail.com (Robert A'gata) Date: Tue, 4 Oct 2011 23:41:06 -0400 Subject: [R] AsOf join in R Message-ID: Hi, I tried to google for any solution for asof join operator in R. But I couldn't find one. The asof join operator AsOf(A,B) merges 2 time series by looking for latest available value of B prior to each time point in A. For example, A <- xts(c(10,15,20,25), order.by=as.POSIXct(c("2011-09-01","2011-09-09","2011-09-10","2011-09-15")) B <- xts(c(1.1,1.5,1.3,1.7), order.by=as.POSIXct(c("2011-08-31","2011-09-09","2011-09-11","2011-09-12")) AsOf(A,B) should return A B 2011-09-01 10 1.1 2011-09-09 15 1.1 # (because latest value B prior to 2011-09-09 is 1.1) 2011-09-10 20 1.5 2011-09-15 25 1.7 How do I write the above AsOf function in R? The merge function does not do what I want because it will align points that have the same time stamp together while what I want is actually latest value prior to timestamp in A. Any example would be greatly appreciated. Thank you. Cheers, Robert From bonda at hsu-hh.de Tue Oct 4 23:39:33 2011 From: bonda at hsu-hh.de (bonda) Date: Tue, 4 Oct 2011 14:39:33 -0700 (PDT) Subject: [R] gefp() boundaries? Message-ID: <1317764373747-3872529.post@n4.nabble.com> Hello all, I have the following two questions: 1) how can I get the values of boundaries for fluctuation process gefp(), for functionals maxBB, meanL2BB, etc? 2) how can I get fragments of gefp()-process, e.g., if I have n=200 observations, i=1,2,...,200, and need gefp()[50:100], i.e. from i=50 till i=100? Thank you in advance, Julia -- View this message in context: http://r.789695.n4.nabble.com/gefp-boundaries-tp3872529p3872529.html Sent from the R help mailing list archive at Nabble.com. From donaldngwe at gmail.com Wed Oct 5 00:21:15 2011 From: donaldngwe at gmail.com (darkgaze) Date: Tue, 4 Oct 2011 15:21:15 -0700 (PDT) Subject: [R] Create combinations of rows Message-ID: <1317766875225-3872641.post@n4.nabble.com> I don't quite know how to word what I want, but if I have (1, 2, 3); (a, b, c); (x, y) I want: 1 a x 1 b x 1 c x 1 a y 1 b y 1 c y 2 a ... and so forth What is the appropriate command? Best, Don -- View this message in context: http://r.789695.n4.nabble.com/Create-combinations-of-rows-tp3872641p3872641.html Sent from the R help mailing list archive at Nabble.com. From scott.raynaud at yahoo.com Wed Oct 5 03:53:19 2011 From: scott.raynaud at yahoo.com (Scott Raynaud) Date: Tue, 4 Oct 2011 18:53:19 -0700 (PDT) Subject: [R] SPlus to R Message-ID: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tomdharray at gmail.com Wed Oct 5 04:52:52 2011 From: tomdharray at gmail.com (Tom D. Harray) Date: Tue, 04 Oct 2011 22:52:52 -0400 Subject: [R] fgrep with caret (^) meta-character in system() call Message-ID: <4E8BC684.50600@gmail.com> Hi there, I would like to use my linux system's fgrep to search for a text pattern in a file. Calling system with system("fgrep \"SearchPattern\" /path/to/the/textFile.txt") works in general, but I need to search for the search pattern at the beginning of the line. The corresponding shell command fgrep "^SearchPattern" /path/to/the/textFile.txt | |___ here's my problem does exactly what I want. I tried various combinations on ", \", \^, but failed to make system() work. How can I call the working shell command including the caret meta-character with system()? Thanks and regards, dirk P.S.: Actually I have to search for about 5.000 patterns, stored in an R list, in a text file with about 30.000.000 lines. The patterns appear in one or more lines of the text file. Only those lines have to be extracted if the patterns at the beginning of the line. Example with matching line 1, non matching line 2, non-matching line 3 (line three comprises aaa, but not at the beginning of the line 3): SearchPattern = "^aaa" Text file: aaaooooooooooo bbbiiiiiiiiiii aacttttttttaaa Going line by line through the file in R is too slow, and I cannot program it in C or C++. Hence I use the fgrep command. I would appreciate if anyone has a fast alternative which works with R on Linux and Windows systems. From dmfall2010 at yahoo.com Wed Oct 5 05:02:10 2011 From: dmfall2010 at yahoo.com (William Claster) Date: Tue, 4 Oct 2011 20:02:10 -0700 (PDT) Subject: [R] experimenting (like Weka Experimenter) Message-ID: <1317783730.68045.YahooMailNeo@web113415.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ahoerner at rprogress.org Wed Oct 5 06:27:54 2011 From: ahoerner at rprogress.org (andrewH) Date: Tue, 4 Oct 2011 21:27:54 -0700 (PDT) Subject: [R] reporting multiple objects out of a function Message-ID: <1317788874982-3873380.post@n4.nabble.com> Dear folks, I?m trying to build a function to create and make available some variables I frequently use for testing purposes. Suppose I have a function that takes some inputs and creates (internally) several named objects. Say, fun1 <- function(x, y, z) {obj1 <- x; obj2 <- y; obj3 <- z } Here is the challenge: After I run it, I want the objects to be available in the calling environment, but not necessarily in the global environment. I want them to be individually available, not as part of a list or some larger object. I can not figure out how to do this. If I understand the situation correctly, I am trying to move several separate objects from the environment of the function to the environment in which the function was invoked (the ?calling environment,? yes?). I?m pretty sure there is a command to do this, but I?m not sure how to find it. Any help would be greatly appreciated ? either on the necessary code, or on how to search for it, or a reference to a good discussion of this family of problems. Sincerely, andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3873380.html Sent from the R help mailing list archive at Nabble.com. From vicvoncastle at gmail.com Wed Oct 5 07:07:42 2011 From: vicvoncastle at gmail.com (Ken) Date: Wed, 5 Oct 2011 01:07:42 -0400 Subject: [R] fgrep with caret (^) meta-character in system() call In-Reply-To: <4E8BC684.50600@gmail.com> References: <4E8BC684.50600@gmail.com> Message-ID: <2244BFB0-AA77-4620-A6C5-A5193B8A5877@gmail.com> man awk? I've used awk for similar tasks (if I am reading the post correctly.) Google-Fu should turn up some useful examples. Also awk should be on your linux installation in some form or another. Regards, Ken Hutchison On Oct 4, 2554 BE, at 10:52 PM, "Tom D. Harray" wrote: > Hi there, > > I would like to use my linux system's fgrep to search for a text pattern > in a file. Calling system with > > system("fgrep \"SearchPattern\" /path/to/the/textFile.txt") > > works in general, but I need to search for the search pattern at the > beginning of the line. > > The corresponding shell command > > fgrep "^SearchPattern" /path/to/the/textFile.txt > | > |___ here's my problem > > does exactly what I want. I tried various combinations on ", \", \^, but > failed to make system() work. > > How can I call the working shell command including the caret > meta-character with system()? > > Thanks and regards, > > dirk > > > P.S.: Actually I have to search for about 5.000 patterns, stored in an R > list, in a text file with about 30.000.000 lines. The patterns appear in > one or more lines of the text file. Only those lines have to be > extracted if the patterns at the beginning of the line. > > Example with matching line 1, non matching line 2, non-matching line 3 > (line three comprises aaa, but not at the beginning of the line 3): > > SearchPattern = "^aaa" > > Text file: aaaooooooooooo > bbbiiiiiiiiiii > aacttttttttaaa > > Going line by line through the file in R is too slow, and I cannot > program it in C or C++. Hence I use the fgrep command. I would > appreciate if anyone has a fast alternative which works with R on Linux > and Windows systems. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Peter.Alspach at plantandfood.co.nz Wed Oct 5 07:10:24 2011 From: Peter.Alspach at plantandfood.co.nz (Peter Alspach) Date: Wed, 5 Oct 2011 18:10:24 +1300 Subject: [R] Create combinations of rows In-Reply-To: <1317766875225-3872641.post@n4.nabble.com> References: <1317766875225-3872641.post@n4.nabble.com> Message-ID: <3CD374BF2C285940A54C0A71225AF7DB02C83D5B87@AKLEXM01.PFR.CO.NZ> Tena koe Don ?expand.grid HTH .... Peter Alspach > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of darkgaze > Sent: Wednesday, 5 October 2011 11:21 a.m. > To: r-help at r-project.org > Subject: [R] Create combinations of rows > > I don't quite know how to word what I want, but if I have > > (1, 2, 3); (a, b, c); (x, y) > > I want: > > 1 a x > 1 b x > 1 c x > 1 a y > 1 b y > 1 c y > 2 a > ... > > and so forth > > What is the appropriate command? > > Best, > Don > > -- > View this message in context: http://r.789695.n4.nabble.com/Create- > combinations-of-rows-tp3872641p3872641.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. The contents of this e-mail are confidential and may be subject to legal privilege. If you are not the intended recipient you must not use, disseminate, distribute or reproduce all or any part of this e-mail or attachments. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. Any opinion or views expressed in this e-mail are those of the individual sender and may not represent those of The New Zealand Institute for Plant and Food Research Limited. From Achim.Zeileis at uibk.ac.at Wed Oct 5 07:32:04 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Wed, 5 Oct 2011 07:32:04 +0200 (CEST) Subject: [R] gefp() boundaries? In-Reply-To: <1317764373747-3872529.post@n4.nabble.com> References: <1317764373747-3872529.post@n4.nabble.com> Message-ID: On Tue, 4 Oct 2011, bonda wrote: > Hello all, > > I have the following two questions: 1) how can I get the values of > boundaries for fluctuation process gefp(), for functionals maxBB, meanL2BB, > etc? Both, maxBB and meanL2BB, are objects of class "efpFunctional" that contain several functions including $computeCritval and $boundary that compute the critical value c and base boundary b(t), respectively. The full boundary is the c * b(t). In case of maxBB and meanL2BB however, the boundary is constant anyway, i.e., b(t) = 1, so you essentially need only the critical value. ## compute and plot gefp library("strucchange") data("durab") scus <- gefp(y ~ lag, data = durab) plot(scus, functional = meanL2BB) ## add the critical value (= boundary here) in another color abline(h = meanL2BB$computeCritval(0.05, nproc = 2), col = 4) > 2) how can I get fragments of gefp()-process, e.g., if I have n=200 > observations, i=1,2,...,200, and need gefp()[50:100], i.e. from i=50 till > i=100? gefp_object$process has the cumulative score process as a "zoo" object which preserves the original time scale of the data (if any), e.g. proc <- scus$process window(proc, start = 2000, end = 2001) proc[11:20] Note, however, that the process has n+1 observations because an additional zero is added as the first element (to facilitate visualizations etc.). See ?gefp, ?efpFunctional for more information and in particular Zeileis A. (2006), Implementing a Class of Structural Change Tests: An Econometric Computing Approach. _Computational Statistics & Data Analysis_, *50*, 2987-3008. doi:10.1016/j.csda.2005.07.001. hth, Z > Thank you in advance, > Julia > > -- > View this message in context: http://r.789695.n4.nabble.com/gefp-boundaries-tp3872529p3872529.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jwiley.psych at gmail.com Wed Oct 5 08:08:56 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 4 Oct 2011 23:08:56 -0700 Subject: [R] SPlus to R In-Reply-To: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: Hi Scott, I am not familiar with S-Plus (though many aspects are quite similar to R). I will say that your function looks approximately correct. I am not familiar with the ss.rand function. I searched, and found some things that I suspect are similar in the packages MBESS, but without knowing more about it from S-Plus, it is tough to make a testable example. Do you have access to S-Plus? Can you provide more information about this function, what it does, what is like, etc.? There are some active members of this list who are quite familiar with S-Plus so one of them may be more insightful. Cheers, Josh On Tue, Oct 4, 2011 at 6:53 PM, Scott Raynaud wrote: > I'm trying to convert an S-Plus program to R.? Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help.? All I did was convert the underscores in S-Plus to the assignment operator <-.? Here are the first few lines of the S-Plus file: > > sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, > ???????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > return(ne=ne1) > ?????????????? } > > > My?translation looks like this: > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > ????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ?ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > ?return(ne=ne1) > ?????????????? } > > The program runs without throwing errors, but I'm not getting any ourput in the console.? This is where it should be, right?? I think I have this set up correctly.? I'm using method=3 which only requires nc and d to be specified.? Any ideas why I'm not seeing output? > > Here is the entire output: > >> ## sshc.ssc: sample size calculation for historical control studies >> ## J. Jack Lee (jjlee at mdanderson.org) and Chi-hong Tseng >> ## Department of Biostatistics, Univ. of Texas M.D. Anderson Cancer Center >> ## >> ## 3/1/99 >> ## updated 6/7/00: add loess >> ##------------------------------------------------------------------ >> ######## Required Input: >> # >> # rc???? number of response in historical control group >> # nc???? sample size in historical control >> # d????? target improvement = Pe - Pc >> # method 1=method based on the randomized design >> #??????? 2=Makuch & Simon method (Makuch RW, Simon RM. Sample size considerations >> #????????? for non-randomized comparative studies. J of Chron Dis 1980; 3:175-181. >> #??????? 3=uniform power method >> ######## optional Input: >> # >> # alpha? size of the test >> # power? desired power of the test >> # tol??? convergence criterion for methods 1 & 2 in terms of sample size >> # tol1?? convergence criterion for method 3 at any given obs Rc in terms of difference >> #????????? of expected power from target >> # tol2?? overall convergence criterion for method 3 as the max absolute deviation >> #????????? of expected power from target for all Rc >> # cc???? range of multiplicative constant applied to the initial values ne >> # l.span smoothing constant for loess >> # >> # Note:? rc is required for methods 1 and 2 but not 3 >> #??????? method 3 return the sample size need for rc=0 to (1-d)*nc >> # >> ######## Output >> # for methdos 1 & 2: return the sample size needed for the experimental group (1 number) >> #??????????????????? for given rc, nc, d, alpha, and power >> # for method 3:????? return the profile of sample size needed for given nc, d, alpha, and power >> #??????????????????? vector $ne contains the sample size corresponding to rc=0, 1, 2, ... nc*(1-d) >> #??????????????????? vector $Ep contains the expected power corresponding to >> #????????????????????? the true pc = (0, 1, 2, ..., nc*(1-d)) / nc >> # >> #------------------------------------------------------------------ >> sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > +????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > + { > + ### for method 1 > + if (method==1) { > + ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > + return(ne=ne1) > +??????????????? } > + ### for method 2 > + if (method==2) { > + ne<-nc > + ne1<-nc+50 > + while(abs(ne-ne1)>tol & ne1<100000){ > + ne<-ne1 > + pe<-d+rc/nc > + ne1<-nef(rc,nc,pe*ne,ne,alpha,power) > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > + } > + if (ne1>100000) return(NA) > + else return(ne=ne1) > + } > + ### for method 3 > + if (method==3) { > + if (tol1 > tol2/10) tol1<-tol2/10 > + ncstar<-(1-d)*nc > + pc<-(0:ncstar)/nc > + ne<-rep(NA,ncstar + 1) > + for (i in (0:ncstar)) > + { ne[i+1]<-ss.rand(i,nc,d,alpha=.05,power=.8,tol=.01) > + } > + plot(pc,ne,type='l',ylim=c(0,max(ne)*1.5)) > + ans<-c.searchd(nc, d, ne, alpha, power, cc, tol1) > + ### check overall absolute deviance > + old.abs.dev<-sum(abs(ans$Ep-power)) > + ##bad<-0 > + print(round(ans$Ep,4)) > + print(round(ans$ne,2)) > + lines(pc,ans$ne,lty=1,col=8) > + old.ne<-ans$ne > + ##while(max(abs(ans$Ep-power))>tol2 & bad==0){? #### unnecessary ## > + while(max(abs(ans$Ep-power))>tol2){ > + ans<-c.searchd(nc, d, ans$ne, alpha, power, cc, tol1) > + abs.dev<-sum(abs(ans$Ep-power)) > + print(paste(" old.abs.dev=",old.abs.dev)) > + print(paste("???? abs.dev=",abs.dev)) > + ##if (abs.dev > old.abs.dev) { bad<-1} > + old.abs.dev<-abs.dev > + print(round(ans$Ep,4)) > + print(round(ans$ne,2)) > + lines(pc,old.ne,lty=1,col=1) > + lines(pc,ans$ne,lty=1,col=8) > + ### add convex > + ans$ne<-convex(pc,ans$ne)$wy > + ### add loess > + ###old.ne<-ans$ne > + loess.ne<-loess(ans$ne ~ pc, span=l.span) > + lines(pc,loess.ne$fit,lty=1,col=4) > + old.ne<-loess.ne$fit > + ###readline() > + } > + return(ne=ans$ne, Ep=ans$Ep) > +??????????????? } > + } >> >> ## needed for method 1 >> nef2<-function(rc,nc,re,ne,alpha,power){ > + za<-qnorm(1-alpha) > + zb<-qnorm(power) > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<- 1/(4*(xc-xe)^2/(za+zb)^2-1/(nc+0.5)) - 0.5 > + return(ans) > + } >> ## needed for method 2 >> nef<-function(rc,nc,re,ne,alpha,power){ > + za<-qnorm(1-alpha) > + zb<-qnorm(power) > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<-(za*sqrt(1+(ne+0.5)/(nc+0.5))+zb)^2/(2*(xe-xc))^2-0.5 > + return(ans) > + } >> ## needed for method 3 >> c.searchd<-function(nc, d, ne, alpha=0.05, power=0.8, cc=c(0.1,2),tol1=0.0001){ > + #--------------------------- > + # nc???? sample size of control group > + # d????? the differece to detect between control and experiment > + # ne???? vector of starting sample size of experiment group > + #??? corresonding to rc of 0 to nc*(1-d) > + # alpha? size of test > + # power? target power > + # cc? pre-screen vector of constant c, the range should cover the > + #??? the value of cc that has expected power > + # tol1?? the allowance between the expceted power and target power > + #--------------------------- > + pc<-(0:((1-d)*nc))/nc > + ncl<-length(pc) > + ne.old<-ne > + ne.old1<-ne.old > + ### sweeping forward > + for(i in 1:ncl){ > + cmin<-cc[1] > + cmax<-cc[2] > + ### fixed cci<-cmax bug > + cci <-1 > + lhood<-dbinom((i:ncl)-1,nc,pc[i]) > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > + Ep0 <-Epower(nc, d, ne, pc, alpha) > + while(abs(Ep0[i]-power)>tol1){ > + if(Ep0[i] + else cmax<-cci > + cci<-(cmax+cmin)/2 > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + } > +? ne.old1<-ne > + } > + ne1<-ne > + ### sweeping backward -- ncl:i > + ne.old2<-ne.old > + ne???? <-ne.old > + for(i in ncl:1){ > + cmin<-cc[1] > + cmax<-cc[2] > + ### fixed cci<-cmax bug > + cci <-1 > + lhood<-dbinom((ncl:i)-1,nc,pc[i]) > + lenl <-length(lhood) > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > + Ep0 <-Epower(nc, d, cci*ne, pc, alpha) > + while(abs(Ep0[i]-power)>tol1){ > + if(Ep0[i] + else cmax<-cci > + cci<-(cmax+cmin)/2 > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + } > +? ne.old2<-ne > + } > + ne2<-ne > + ne<-(ne1+ne2)/2 > + #cat(ccc*ne) > + Ep1<-Epower(nc, d, ne, pc, alpha) > + return(ne=ne, Ep=Ep1) > + } >> ### >> vertex<-function(x,y) > + { n<-length(x) > + vx<-x[1] > + vy<-y[1] > + vp<-1 > + up<-T > + for (i in (2:n)) > + { if (up) > + { if (y[i-1] > y[i]) > + {vx<-c(vx,x[i-1]) > +? vy<-c(vy,y[i-1]) > +? vp<-c(vp,i-1) > +? up<-F > + } > + } > + else > + { if (y[i-1] < y[i]) up<-T > + } > + } > + vx<-c(vx,x[n]) > + vy<-c(vy,y[n]) > + vp<-c(vp,n) > + return(vx=vx,vy=vy,vp=vp) > + } >> ### >> convex<-function(x,y) > + { > + n<-length(x) > + ans<-vertex(x,y) > + len<-length(ans$vx) > + while (len>3) > + { > + #cat("x=",x,"\n") > + #cat("y=",y,"\n") > + newx<-x[1:(ans$vp[2]-1)] > + newy<-y[1:(ans$vp[2]-1)] > + for (i in (2:(len-1))) > + { > +? newx<-c(newx,x[ans$vp[i]]) > + newy<-c(newy,y[ans$vp[i]]) > + } > + newx<-c(newx,x[(ans$vp[len-1]+1):n]) > + newy<-c(newy,y[(ans$vp[len-1]+1):n]) > + y<-approx(newx,newy,xout=x)$y > + #cat("new y=",y,"\n") > + ans<-vertex(x,y) > + len<-length(ans$vx) > + #cat("vx=",ans$vx,"\n") > + #cat("vy=",ans$vy,"\n") > + } > + return(wx=x,wy=y)} >> ### >> Epower<-function(nc, d, ne, pc = (0:((1 - d) * nc))/nc, alpha = 0.05) > + { > + #------------------------------------- > + # nc???? sample size in historical control > + # d????? the increase of response rate between historical and experiment > + # ne???? sample size of corresonding rc of 0 to nc*(1-d) > + # pc???? the response rate of control group, where we compute the > + #??????? expected power > + # alpha? the size of test > + #------------------------------------- > + kk <- length(pc) > + rc <- 0:(nc * (1 - d)) > + pp <- rep(NA, kk) > + ppp <- rep(NA, kk) > + for(i in 1:(kk)) { > + pe <- pc[i] + d > + lhood <- dbinom(rc, nc, pc[i]) > + pp <- power1.f(rc, nc, ne, pe, alpha) > + ppp[i] <- sum(pp * lhood)/sum(lhood) > + } > + return(ppp) > + } >> >> # adapted from the old biss2 >> ss.rand<-function(rc,nc,d,alpha=.05,power=.8,tol=.01) > + { > + ne<-nc > + ne1<-nc+50 > + while(abs(ne-ne1)>tol & ne1<100000){ > + ne<-ne1 > + pe<-d+rc/nc > + ne1<-nef2(rc,nc,pe*ne,ne,alpha,power) > + > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > + } > + if (ne1>100000) return(NA) > + else return(ne1) > + } >> ### >> power1.f<-function(rc,nc,ne,pie,alpha=0.05){ > + #------------------------------------- > + # rcnumber of response in historical control > + # ncsample size in historical control > + # ne??? sample size in experitment group > + # pietrue response rate for experiment group > + # alphasize of the test > + #------------------------------------- > + > + za<-qnorm(1-alpha) > + re<-ne*pie > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<-za*sqrt(1+(ne+0.5)/(nc+0.5))-(xe-xc)/sqrt(1/(4*(ne+0.5))) > + return(1-pnorm(ans)) > + } > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From petr.pikal at precheza.cz Wed Oct 5 08:29:42 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Wed, 5 Oct 2011 08:29:42 +0200 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: > > On Tue, 4 Oct 2011, Sarah Goslee wrote: > > > You asked for pointers, and didn't provide a reproducible example, so I > > offered a pointer. > > Sarah, > > I did not realize that your pointer was to the factor component of the > subset() command. > > I think the most parsimonious thing for me to do is to modify the database > table with a new column of the full stream name, then re-export and re-read > into R. Hm. I seldom use such approach. In your original request you said you want split your data to smaller data frames based on sites ----- I need to create subsets (as data frames) based on sites, but including all sites on each stream. For example, using the initial site factor shown ------ >From what we know it is difficult to say if there is some common feature in site variable. If it is organised like XY-N you can simply make new variable from first two letters sites <- substr(chemdata$site,1,2) then you can split your data frame according to sites chem.spl <- split(chemdata, sites) and do anything with your splitted data frames organised in list Regards Petr > > Thanks, > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From micahd at rams.colostate.edu Wed Oct 5 06:54:35 2011 From: micahd at rams.colostate.edu (MicahD) Date: Tue, 4 Oct 2011 21:54:35 -0700 (PDT) Subject: [R] Running a GMM Estimation on dynamic Panel Model using plm-Package In-Reply-To: <1307907816544-3592466.post@n4.nabble.com> References: <1307907816544-3592466.post@n4.nabble.com> Message-ID: <1317790475197-3873431.post@n4.nabble.com> Hi bstudent, I've had the same problem and I wish there was a definitive answer as this seems to be the #1 problem with the package and pgmm would be awesome for economists if we could figure out how to work it! I'm no expert on GMM, but from what I've gathered from other posts, the problem may stem from your panel data being a "long panel" with more time-varying observations than cross-sectional (aka individual level) observations. If that happens then there's a problem with the number of instruments used in the Arellano-Bond estimator. I'm pretty sure you can determine exactly when it would be a problem and what size your data set has to be, but you might have to learn about the asymptotics of the Arellano-Bond estimator. Maybe someday someone who knows more about GMM will tell us how to figure this one out. I love this package, though, and panel data is at the pinnacle of dynamic empirical analysis in economics, so I wish someone could come up with more detailed instructions for non-experts. Panel GMM is becoming widely known as the most efficient estimator of panel data regressions and I think I'd ask the plim package to marry me if I ever found out how to work pgmm. Here's a link to a good paper on "optimal instruments" by Arellano: http://www.cemfi.es/~arellano/siv2004.pdf Instrumental Variables for Dynamic Panel Models - Arellano (2004) - Micah ps I simply reverted to Stata in order to get my Panel GMM estimations. -- View this message in context: http://r.789695.n4.nabble.com/Running-a-GMM-Estimation-on-dynamic-Panel-Model-using-plm-Package-tp3592466p3873431.html Sent from the R help mailing list archive at Nabble.com. From enricoschumann at yahoo.de Wed Oct 5 06:36:15 2011 From: enricoschumann at yahoo.de (Enrico Schumann) Date: Wed, 05 Oct 2011 06:36:15 +0200 Subject: [R] Create combinations of rows In-Reply-To: <1317766875225-3872641.post@n4.nabble.com> References: <1317766875225-3872641.post@n4.nabble.com> Message-ID: <4E8BDEBF.2000507@yahoo.de> ?expand.grid Am 05.10.2011 00:21, schrieb darkgaze: > I don't quite know how to word what I want, but if I have > > (1, 2, 3); (a, b, c); (x, y) > > I want: > > 1 a x > 1 b x > 1 c x > 1 a y > 1 b y > 1 c y > 2 a > ... > > and so forth > > What is the appropriate command? > > Best, > Don > > -- > View this message in context: http://r.789695.n4.nabble.com/Create-combinations-of-rows-tp3872641p3872641.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Enrico Schumann Lucerne, Switzerland http://nmof.net/ From dmfall2010 at yahoo.com Wed Oct 5 07:06:00 2011 From: dmfall2010 at yahoo.com (William Claster) Date: Tue, 4 Oct 2011 22:06:00 -0700 (PDT) Subject: [R] (no subject) Message-ID: <1317791160.2482.YahooMailNeo@web113409.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From koshihaku at gmail.com Wed Oct 5 07:54:20 2011 From: koshihaku at gmail.com (koshihaku) Date: Tue, 4 Oct 2011 22:54:20 -0700 (PDT) Subject: [R] Is the output of survfit.coxph survival or baseline survival? In-Reply-To: <1317432668597-3861919.post@n4.nabble.com> References: <1317432668597-3861919.post@n4.nabble.com> Message-ID: <1317794060812-3873512.post@n4.nabble.com> Dear all, Your advices was a great help to my study.Thank you very much! -- View this message in context: http://r.789695.n4.nabble.com/Is-the-output-of-survfit-coxph-survival-or-baseline-survival-tp3861919p3873512.html Sent from the R help mailing list archive at Nabble.com. From koshihaku at gmail.com Wed Oct 5 07:56:39 2011 From: koshihaku at gmail.com (koshihaku) Date: Tue, 4 Oct 2011 22:56:39 -0700 (PDT) Subject: [R] How to get the hazard of coxph (not cumulative hazard) Message-ID: <1317794199280-3873516.post@n4.nabble.com> Dear all, I think the coxph and survfit.coxph can give the cumulative hazard of cox model. But is there any method to calculate the hazard Lambda(t)=lambda_0(t)*exp{beta*X(t)}? Any suggestion will be great help. Thank you very much! Koshihaku -- View this message in context: http://r.789695.n4.nabble.com/How-to-get-the-hazard-of-coxph-not-cumulative-hazard-tp3873516p3873516.html Sent from the R help mailing list archive at Nabble.com. From linshuang11 at gmail.com Wed Oct 5 07:26:22 2011 From: linshuang11 at gmail.com (sevenfrost) Date: Tue, 4 Oct 2011 22:26:22 -0700 (PDT) Subject: [R] cuhre usage ?? multidimensional integration Message-ID: <1317792382956-3873478.post@n4.nabble.com> my=function(x){ len=1 for(i in 1:len){ y[i]=x[i] } g=1 w=NULL t=NULL for(i in 1:len)w[i]=x[i+len] for(i in 1:len)t[i]=x[i+2*len] for(i in 1:len)g=g*dnorm(y[i])*dnorm(w[i])*dnorm(z[i]) return(g) } cuhre(6,1,my,rep(-100,6),rep(100,6)) Error in crff(match.call(), integrand, "cuhre", libargs, ...) : Additional argument not expected in the integrand function function change to my=function(x,g,i,j) result is not right. it should be 1, but it turns out to be 0.039... How can I make this work? Thank you! -- View this message in context: http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3873478.html Sent from the R help mailing list archive at Nabble.com. From spencer.graves at structuremonitoring.com Wed Oct 5 09:00:06 2011 From: spencer.graves at structuremonitoring.com (Spencer Graves) Date: Wed, 05 Oct 2011 00:00:06 -0700 Subject: [R] SPlus to R In-Reply-To: References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: <4E8C0076.5030605@structuremonitoring.com> When R was invented, nearly all of the core R functions were written to produce exactly the same answers as those returned by S-Plus. Some very minor exceptions were made for time series functions, for example, where better algorithms in R produced slightly better fits. There may be other, less benign examples, but none come to mind at the moment. Some of the core R functions have arguments that were not available in S-Plus at the time they were written. For example, probability functions included log and lower.tail arguments that, if my memory is correct, were not available in S-Plus at that time. More recent versions of S-Plus can reportedly run R scripts and packages without change. One set of examples comes to mind where you can find subtle differences between S-Plus and R: If you get Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer) and try to run the code exactly as written in the book, you may get the answers in the book under S-Plus but not in a few cases in R. However, the nlme package companion to the book includes a "scripts" directory containing files required to obtain all the results in that book. In most cases, the functions and syntax, etc., are exactly the same in S-Plus and R. However, there are a very few cases where the answers are NOT the same, because some defaults have changed. I don't remember the details now. If you have an S-Plus script in which you replaced "_" by "<-", that should take care of most if not all of the required conversion. However, you need to check. If it breaks when you try to run it, find where it breaks, read the documentation and try to understand why it breaks and how to fix it. The "dubug" function in R can help immensely with this: If you say debug(fn), then each time fn is called, it will stop and allow you to walk through fn line by line evaluating what it does, even changing the code and values of variables on the fly if you like. This can help you fix anything that breaks AND understand whether you are still getting the correct answer. Hope this helps. Spencer On 10/4/2011 11:08 PM, Joshua Wiley wrote: > Hi Scott, > > I am not familiar with S-Plus (though many aspects are quite similar > to R). I will say that your function looks approximately correct. I > am not familiar with the ss.rand function. I searched, and found some > things that I suspect are similar in the packages MBESS, but without > knowing more about it from S-Plus, it is tough to make a testable > example. > > Do you have access to S-Plus? Can you provide more information about > this function, what it does, what is like, etc.? There are some > active members of this list who are quite familiar with S-Plus so one > of them may be more insightful. > > Cheers, > > Josh > > On Tue, Oct 4, 2011 at 6:53 PM, Scott Raynaud wrote: >> I'm trying to convert an S-Plus program to R. Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help. All I did was convert the underscores in S-Plus to the assignment operator<-. Here are the first few lines of the S-Plus file: >> >> sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, >> tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) >> { >> ### for method 1 >> if (method==1) { >> ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) >> return(ne=ne1) >> } >> >> >> My translation looks like this: >> >> sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, >> tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) >> { >> ### for method 1 >> if (method==1) { >> ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) >> return(ne=ne1) >> } >> >> The program runs without throwing errors, but I'm not getting any ourput in the console. This is where it should be, right? I think I have this set up correctly. I'm using method=3 which only requires nc and d to be specified. Any ideas why I'm not seeing output? >> >> Here is the entire output: >> >>> ## sshc.ssc: sample size calculation for historical control studies >>> ## J. Jack Lee (jjlee at mdanderson.org) and Chi-hong Tseng >>> ## Department of Biostatistics, Univ. of Texas M.D. Anderson Cancer Center >>> ## >>> ## 3/1/99 >>> ## updated 6/7/00: add loess >>> ##------------------------------------------------------------------ >>> ######## Required Input: >>> # >>> # rc number of response in historical control group >>> # nc sample size in historical control >>> # d target improvement = Pe - Pc >>> # method 1=method based on the randomized design >>> # 2=Makuch& Simon method (Makuch RW, Simon RM. Sample size considerations >>> # for non-randomized comparative studies. J of Chron Dis 1980; 3:175-181. >>> # 3=uniform power method >>> ######## optional Input: >>> # >>> # alpha size of the test >>> # power desired power of the test >>> # tol convergence criterion for methods 1& 2 in terms of sample size >>> # tol1 convergence criterion for method 3 at any given obs Rc in terms of difference >>> # of expected power from target >>> # tol2 overall convergence criterion for method 3 as the max absolute deviation >>> # of expected power from target for all Rc >>> # cc range of multiplicative constant applied to the initial values ne >>> # l.span smoothing constant for loess >>> # >>> # Note: rc is required for methods 1 and 2 but not 3 >>> # method 3 return the sample size need for rc=0 to (1-d)*nc >>> # >>> ######## Output >>> # for methdos 1& 2: return the sample size needed for the experimental group (1 number) >>> # for given rc, nc, d, alpha, and power >>> # for method 3: return the profile of sample size needed for given nc, d, alpha, and power >>> # vector $ne contains the sample size corresponding to rc=0, 1, 2, ... nc*(1-d) >>> # vector $Ep contains the expected power corresponding to >>> # the true pc = (0, 1, 2, ..., nc*(1-d)) / nc >>> # >>> #------------------------------------------------------------------ >>> sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, >> + tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) >> + { >> + ### for method 1 >> + if (method==1) { >> + ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) >> + return(ne=ne1) >> + } >> + ### for method 2 >> + if (method==2) { >> + ne<-nc >> + ne1<-nc+50 >> + while(abs(ne-ne1)>tol& ne1<100000){ >> + ne<-ne1 >> + pe<-d+rc/nc >> + ne1<-nef(rc,nc,pe*ne,ne,alpha,power) >> + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) >> + } >> + if (ne1>100000) return(NA) >> + else return(ne=ne1) >> + } >> + ### for method 3 >> + if (method==3) { >> + if (tol1> tol2/10) tol1<-tol2/10 >> + ncstar<-(1-d)*nc >> + pc<-(0:ncstar)/nc >> + ne<-rep(NA,ncstar + 1) >> + for (i in (0:ncstar)) >> + { ne[i+1]<-ss.rand(i,nc,d,alpha=.05,power=.8,tol=.01) >> + } >> + plot(pc,ne,type='l',ylim=c(0,max(ne)*1.5)) >> + ans<-c.searchd(nc, d, ne, alpha, power, cc, tol1) >> + ### check overall absolute deviance >> + old.abs.dev<-sum(abs(ans$Ep-power)) >> + ##bad<-0 >> + print(round(ans$Ep,4)) >> + print(round(ans$ne,2)) >> + lines(pc,ans$ne,lty=1,col=8) >> + old.ne<-ans$ne >> + ##while(max(abs(ans$Ep-power))>tol2& bad==0){ #### unnecessary ## >> + while(max(abs(ans$Ep-power))>tol2){ >> + ans<-c.searchd(nc, d, ans$ne, alpha, power, cc, tol1) >> + abs.dev<-sum(abs(ans$Ep-power)) >> + print(paste(" old.abs.dev=",old.abs.dev)) >> + print(paste(" abs.dev=",abs.dev)) >> + ##if (abs.dev> old.abs.dev) { bad<-1} >> + old.abs.dev<-abs.dev >> + print(round(ans$Ep,4)) >> + print(round(ans$ne,2)) >> + lines(pc,old.ne,lty=1,col=1) >> + lines(pc,ans$ne,lty=1,col=8) >> + ### add convex >> + ans$ne<-convex(pc,ans$ne)$wy >> + ### add loess >> + ###old.ne<-ans$ne >> + loess.ne<-loess(ans$ne ~ pc, span=l.span) >> + lines(pc,loess.ne$fit,lty=1,col=4) >> + old.ne<-loess.ne$fit >> + ###readline() >> + } >> + return(ne=ans$ne, Ep=ans$Ep) >> + } >> + } >>> ## needed for method 1 >>> nef2<-function(rc,nc,re,ne,alpha,power){ >> + za<-qnorm(1-alpha) >> + zb<-qnorm(power) >> + xe<-asin(sqrt((re+0.375)/(ne+0.75))) >> + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) >> + ans<- 1/(4*(xc-xe)^2/(za+zb)^2-1/(nc+0.5)) - 0.5 >> + return(ans) >> + } >>> ## needed for method 2 >>> nef<-function(rc,nc,re,ne,alpha,power){ >> + za<-qnorm(1-alpha) >> + zb<-qnorm(power) >> + xe<-asin(sqrt((re+0.375)/(ne+0.75))) >> + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) >> + ans<-(za*sqrt(1+(ne+0.5)/(nc+0.5))+zb)^2/(2*(xe-xc))^2-0.5 >> + return(ans) >> + } >>> ## needed for method 3 >>> c.searchd<-function(nc, d, ne, alpha=0.05, power=0.8, cc=c(0.1,2),tol1=0.0001){ >> + #--------------------------- >> + # nc sample size of control group >> + # d the differece to detect between control and experiment >> + # ne vector of starting sample size of experiment group >> + # corresonding to rc of 0 to nc*(1-d) >> + # alpha size of test >> + # power target power >> + # cc pre-screen vector of constant c, the range should cover the >> + # the value of cc that has expected power >> + # tol1 the allowance between the expceted power and target power >> + #--------------------------- >> + pc<-(0:((1-d)*nc))/nc >> + ncl<-length(pc) >> + ne.old<-ne >> + ne.old1<-ne.old >> + ### sweeping forward >> + for(i in 1:ncl){ >> + cmin<-cc[1] >> + cmax<-cc[2] >> + ### fixed cci<-cmax bug >> + cci<-1 >> + lhood<-dbinom((i:ncl)-1,nc,pc[i]) >> + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] >> + Ep0<-Epower(nc, d, ne, pc, alpha) >> + while(abs(Ep0[i]-power)>tol1){ >> + if(Ep0[i]> + else cmax<-cci >> + cci<-(cmax+cmin)/2 >> + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] >> + Ep0<-Epower(nc, d, ne, pc, alpha) >> + } >> + ne.old1<-ne >> + } >> + ne1<-ne >> + ### sweeping backward -- ncl:i >> + ne.old2<-ne.old >> + ne<-ne.old >> + for(i in ncl:1){ >> + cmin<-cc[1] >> + cmax<-cc[2] >> + ### fixed cci<-cmax bug >> + cci<-1 >> + lhood<-dbinom((ncl:i)-1,nc,pc[i]) >> + lenl<-length(lhood) >> + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] >> + Ep0<-Epower(nc, d, cci*ne, pc, alpha) >> + while(abs(Ep0[i]-power)>tol1){ >> + if(Ep0[i]> + else cmax<-cci >> + cci<-(cmax+cmin)/2 >> + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] >> + Ep0<-Epower(nc, d, ne, pc, alpha) >> + } >> + ne.old2<-ne >> + } >> + ne2<-ne >> + ne<-(ne1+ne2)/2 >> + #cat(ccc*ne) >> + Ep1<-Epower(nc, d, ne, pc, alpha) >> + return(ne=ne, Ep=Ep1) >> + } >>> ### >>> vertex<-function(x,y) >> + { n<-length(x) >> + vx<-x[1] >> + vy<-y[1] >> + vp<-1 >> + up<-T >> + for (i in (2:n)) >> + { if (up) >> + { if (y[i-1]> y[i]) >> + {vx<-c(vx,x[i-1]) >> + vy<-c(vy,y[i-1]) >> + vp<-c(vp,i-1) >> + up<-F >> + } >> + } >> + else >> + { if (y[i-1]< y[i]) up<-T >> + } >> + } >> + vx<-c(vx,x[n]) >> + vy<-c(vy,y[n]) >> + vp<-c(vp,n) >> + return(vx=vx,vy=vy,vp=vp) >> + } >>> ### >>> convex<-function(x,y) >> + { >> + n<-length(x) >> + ans<-vertex(x,y) >> + len<-length(ans$vx) >> + while (len>3) >> + { >> + #cat("x=",x,"\n") >> + #cat("y=",y,"\n") >> + newx<-x[1:(ans$vp[2]-1)] >> + newy<-y[1:(ans$vp[2]-1)] >> + for (i in (2:(len-1))) >> + { >> + newx<-c(newx,x[ans$vp[i]]) >> + newy<-c(newy,y[ans$vp[i]]) >> + } >> + newx<-c(newx,x[(ans$vp[len-1]+1):n]) >> + newy<-c(newy,y[(ans$vp[len-1]+1):n]) >> + y<-approx(newx,newy,xout=x)$y >> + #cat("new y=",y,"\n") >> + ans<-vertex(x,y) >> + len<-length(ans$vx) >> + #cat("vx=",ans$vx,"\n") >> + #cat("vy=",ans$vy,"\n") >> + } >> + return(wx=x,wy=y)} >>> ### >>> Epower<-function(nc, d, ne, pc = (0:((1 - d) * nc))/nc, alpha = 0.05) >> + { >> + #------------------------------------- >> + # nc sample size in historical control >> + # d the increase of response rate between historical and experiment >> + # ne sample size of corresonding rc of 0 to nc*(1-d) >> + # pc the response rate of control group, where we compute the >> + # expected power >> + # alpha the size of test >> + #------------------------------------- >> + kk<- length(pc) >> + rc<- 0:(nc * (1 - d)) >> + pp<- rep(NA, kk) >> + ppp<- rep(NA, kk) >> + for(i in 1:(kk)) { >> + pe<- pc[i] + d >> + lhood<- dbinom(rc, nc, pc[i]) >> + pp<- power1.f(rc, nc, ne, pe, alpha) >> + ppp[i]<- sum(pp * lhood)/sum(lhood) >> + } >> + return(ppp) >> + } >>> # adapted from the old biss2 >>> ss.rand<-function(rc,nc,d,alpha=.05,power=.8,tol=.01) >> + { >> + ne<-nc >> + ne1<-nc+50 >> + while(abs(ne-ne1)>tol& ne1<100000){ >> + ne<-ne1 >> + pe<-d+rc/nc >> + ne1<-nef2(rc,nc,pe*ne,ne,alpha,power) >> + >> + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) >> + } >> + if (ne1>100000) return(NA) >> + else return(ne1) >> + } >>> ### >>> power1.f<-function(rc,nc,ne,pie,alpha=0.05){ >> + #------------------------------------- >> + # rcnumber of response in historical control >> + # ncsample size in historical control >> + # ne sample size in experitment group >> + # pietrue response rate for experiment group >> + # alphasize of the test >> + #------------------------------------- >> + >> + za<-qnorm(1-alpha) >> + re<-ne*pie >> + xe<-asin(sqrt((re+0.375)/(ne+0.75))) >> + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) >> + ans<-za*sqrt(1+(ne+0.5)/(nc+0.5))-(xe-xc)/sqrt(1/(4*(ne+0.5))) >> + return(1-pnorm(ans)) >> + } >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com From marcel.au at web.de Wed Oct 5 09:57:59 2011 From: marcel.au at web.de (Marcel.) Date: Wed, 5 Oct 2011 00:57:59 -0700 (PDT) Subject: [R] experimenting (like Weka Experimenter) In-Reply-To: <1317783730.68045.YahooMailNeo@web113415.mail.gq1.yahoo.com> References: <1317783730.68045.YahooMailNeo@web113415.mail.gq1.yahoo.com> Message-ID: <1317801479787-3873741.post@n4.nabble.com> Hello, i think you are speaking of a general workflow environment which can execute R methods and arrange different statistical methods in a flow (graph). Here is a list of links of such (OpenSource) programms. *Knime:* http://www.knime.org/ http://www.knime.org/ *RapidMiner* http://rapid-i.com/content/view/181/190/ http://rapid-i.com/content/view/181/190/ *Red-R:* http://www.red-r.org/ http://www.red-r.org/ *R AnalyticFlow:* http://www.ef-prime.com/products/ranalyticflow_en/ http://www.ef-prime.com/products/ranalyticflow_en/ There are also many others available (using R) but in another context (GIS, Scientific Modelling etc.). I hope this information helps. -- View this message in context: http://r.789695.n4.nabble.com/experimenting-like-Weka-Experimenter-tp3873363p3873741.html Sent from the R help mailing list archive at Nabble.com. From sina.rueeger at gmail.com Wed Oct 5 09:56:35 2011 From: sina.rueeger at gmail.com (sina.r) Date: Wed, 5 Oct 2011 00:56:35 -0700 (PDT) Subject: [R] (no subject) In-Reply-To: <1317791160.2482.YahooMailNeo@web113409.mail.gq1.yahoo.com> References: <1317791160.2482.YahooMailNeo@web113409.mail.gq1.yahoo.com> Message-ID: <1317801395890-3873738.post@n4.nabble.com> Hi William Try double backslashes "\\" or "/" (see FAQ 2.16: http://cran.r-project.org/bin/windows/base/rw-FAQ.html) install.packages("C:\\Users\\rusa\\DMwR_0.2.1.zip", repos = NULL) Regards, Sina -- View this message in context: http://r.789695.n4.nabble.com/no-subject-tp3873600p3873738.html Sent from the R help mailing list archive at Nabble.com. From Thierry.ONKELINX at inbo.be Wed Oct 5 10:05:28 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Wed, 5 Oct 2011 08:05:28 +0000 Subject: [R] (no subject) In-Reply-To: <1317791160.2482.YahooMailNeo@web113409.mail.gq1.yahoo.com> References: <1317791160.2482.YahooMailNeo@web113409.mail.gq1.yahoo.com> Message-ID: Dear William, Please use a more informative subject line (as the posting guide asks you to do). Backslash have a special function. In file paths you need to use either double backslashes or forward slashes. 'C:\\Users\\Bill\\Desktop\\DMwR_0.2.1.zip' 'C:/Users/Bill/Desktop/DMwR_0.2.1.zip' Best regards, Thierry PS It might be time to upgrade your R version. R 2.10.1 is quite old. R 2.14.0 will be available within a month. > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens William Claster > Verzonden: woensdag 5 oktober 2011 7:06 > Aan: r-help at r-project.org > Onderwerp: [R] (no subject) > > Hi. I am trying to install using the following. Can someone suggest what is > wrong? I am using Windows 7 64bit, and R 2.10.1 > > ('C:\Users\Bill\Desktop\DMwR_0.2.1.zip', repos=NULL ) Warning in > install.packages("C:UsersBillDesktopDMwR_0.2.1.zip", repos = NULL) : > ? argument 'lib' is missing: using 'C:\Users\Bill\Documents/R/win-library/2.10' > Error in zip.unpack(pkg, tmpDir) : > ? zipfile 'C:UsersBillDesktopDMwR_0.2.1.zip' not found In addition: Warning > messages: > 1: \U used without hex digits > 2: '\B' is an unrecognized escape in a character string > 3: '\D' is an unrecognized escape in a character string > 4: '\D' is an unrecognized escape in a character string > 5: unrecognized escapes removed from > "C:\Users\Bill\Desktop\DMwR_0.2.1.zip" > > > > I also tried to install using the install menu. > > Thank you. > [[alternative HTML version deleted]] From paul.hiemstra at knmi.nl Wed Oct 5 10:11:35 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Wed, 05 Oct 2011 08:11:35 +0000 Subject: [R] need help on melt/cast In-Reply-To: References: Message-ID: <4E8C1137.90500@knmi.nl> On 09/22/2011 01:54 PM, Eugene Kanshin wrote: > Hello, > I need to convert dataframe from: > > ID T0 T1 T2 > A 1 2 3 > B 4 5 6 > C 7 8 9 > > to: > > ID Variable Value > A T0 1 > A T1 2 > A T2 3 > B T0 4 > B T1 5 > B T2 6 > C T0 7 > C T1 8 > C T2 9 > > i tried to use melt cast but it gives me all the time not exactly what I > need. > Thank you. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hi, I see you already got your answer, but maybe than for future questions... You mention, "I tried to use melt and cast" but you do not provide example code that shows what you tried. This is important for us to understand where you reasoning about melt and cast goes wrong. Especially because a simple (as David mentioned): melt(dat) got you the right answer. good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From paul.hiemstra at knmi.nl Wed Oct 5 10:15:32 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Wed, 05 Oct 2011 08:15:32 +0000 Subject: [R] experimenting (like Weka Experimenter) In-Reply-To: <1317783730.68045.YahooMailNeo@web113415.mail.gq1.yahoo.com> References: <1317783730.68045.YahooMailNeo@web113415.mail.gq1.yahoo.com> Message-ID: <4E8C1224.5060000@knmi.nl> On 10/05/2011 03:02 AM, William Claster wrote: > Hi. I am not that good at R but I was wondering if there is either a tool or a strategy for testing many different models in R in a batch. I have used something in Weka called the Experimenter interface which helps with doing this kind of thing. > > Thank you. > Andy > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hi, When you want to use pure R look at the plyr package and the foreach package. good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From pburns at pburns.seanet.com Wed Oct 5 10:15:53 2011 From: pburns at pburns.seanet.com (Patrick Burns) Date: Wed, 05 Oct 2011 09:15:53 +0100 Subject: [R] SPlus to R In-Reply-To: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: <4E8C1239.9050304@pburns.seanet.com> I had thought that the problem might be: return(ne=ne1) since R doesn't support that any more. But when I tried it, I got results (just without the name on the output). Better would be to change that line to: list(ne=ne1) ('return' is seldom necessary in either R or S+.) I'd suggest putting 'print' or 'cat' statements in to try to figure out where things go wrong. You'll find other hints in 'S Poetry' and 'The R Inferno'. There might be positive probability that at least one of those hints will be useful. Both those are available on www.burns-stat.com On 05/10/2011 02:53, Scott Raynaud wrote: > I'm trying to convert an S-Plus program to R. Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help. All I did was convert the underscores in S-Plus to the assignment operator<-. Here are the first few lines of the S-Plus file: > > sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, > tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > return(ne=ne1) > } > > > My translation looks like this: > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > return(ne=ne1) > } > > The program runs without throwing errors, but I'm not getting any ourput in the console. This is where it should be, right? I think I have this set up correctly. I'm using method=3 which only requires nc and d to be specified. Any ideas why I'm not seeing output? > > Here is the entire output: > >> ## sshc.ssc: sample size calculation for historical control studies >> ## J. Jack Lee (jjlee at mdanderson.org) and Chi-hong Tseng >> ## Department of Biostatistics, Univ. of Texas M.D. Anderson Cancer Center >> ## >> ## 3/1/99 >> ## updated 6/7/00: add loess >> ##------------------------------------------------------------------ >> ######## Required Input: >> # >> # rc number of response in historical control group >> # nc sample size in historical control >> # d target improvement = Pe - Pc >> # method 1=method based on the randomized design >> # 2=Makuch& Simon method (Makuch RW, Simon RM. Sample size considerations >> # for non-randomized comparative studies. J of Chron Dis 1980; 3:175-181. >> # 3=uniform power method >> ######## optional Input: >> # >> # alpha size of the test >> # power desired power of the test >> # tol convergence criterion for methods 1& 2 in terms of sample size >> # tol1 convergence criterion for method 3 at any given obs Rc in terms of difference >> # of expected power from target >> # tol2 overall convergence criterion for method 3 as the max absolute deviation >> # of expected power from target for all Rc >> # cc range of multiplicative constant applied to the initial values ne >> # l.span smoothing constant for loess >> # >> # Note: rc is required for methods 1 and 2 but not 3 >> # method 3 return the sample size need for rc=0 to (1-d)*nc >> # >> ######## Output >> # for methdos 1& 2: return the sample size needed for the experimental group (1 number) >> # for given rc, nc, d, alpha, and power >> # for method 3: return the profile of sample size needed for given nc, d, alpha, and power >> # vector $ne contains the sample size corresponding to rc=0, 1, 2, ... nc*(1-d) >> # vector $Ep contains the expected power corresponding to >> # the true pc = (0, 1, 2, ..., nc*(1-d)) / nc >> # >> #------------------------------------------------------------------ >> sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > + tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > + { > + ### for method 1 > + if (method==1) { > + ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > + return(ne=ne1) > + } > + ### for method 2 > + if (method==2) { > + ne<-nc > + ne1<-nc+50 > + while(abs(ne-ne1)>tol& ne1<100000){ > + ne<-ne1 > + pe<-d+rc/nc > + ne1<-nef(rc,nc,pe*ne,ne,alpha,power) > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > + } > + if (ne1>100000) return(NA) > + else return(ne=ne1) > + } > + ### for method 3 > + if (method==3) { > + if (tol1> tol2/10) tol1<-tol2/10 > + ncstar<-(1-d)*nc > + pc<-(0:ncstar)/nc > + ne<-rep(NA,ncstar + 1) > + for (i in (0:ncstar)) > + { ne[i+1]<-ss.rand(i,nc,d,alpha=.05,power=.8,tol=.01) > + } > + plot(pc,ne,type='l',ylim=c(0,max(ne)*1.5)) > + ans<-c.searchd(nc, d, ne, alpha, power, cc, tol1) > + ### check overall absolute deviance > + old.abs.dev<-sum(abs(ans$Ep-power)) > + ##bad<-0 > + print(round(ans$Ep,4)) > + print(round(ans$ne,2)) > + lines(pc,ans$ne,lty=1,col=8) > + old.ne<-ans$ne > + ##while(max(abs(ans$Ep-power))>tol2& bad==0){ #### unnecessary ## > + while(max(abs(ans$Ep-power))>tol2){ > + ans<-c.searchd(nc, d, ans$ne, alpha, power, cc, tol1) > + abs.dev<-sum(abs(ans$Ep-power)) > + print(paste(" old.abs.dev=",old.abs.dev)) > + print(paste(" abs.dev=",abs.dev)) > + ##if (abs.dev> old.abs.dev) { bad<-1} > + old.abs.dev<-abs.dev > + print(round(ans$Ep,4)) > + print(round(ans$ne,2)) > + lines(pc,old.ne,lty=1,col=1) > + lines(pc,ans$ne,lty=1,col=8) > + ### add convex > + ans$ne<-convex(pc,ans$ne)$wy > + ### add loess > + ###old.ne<-ans$ne > + loess.ne<-loess(ans$ne ~ pc, span=l.span) > + lines(pc,loess.ne$fit,lty=1,col=4) > + old.ne<-loess.ne$fit > + ###readline() > + } > + return(ne=ans$ne, Ep=ans$Ep) > + } > + } >> >> ## needed for method 1 >> nef2<-function(rc,nc,re,ne,alpha,power){ > + za<-qnorm(1-alpha) > + zb<-qnorm(power) > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<- 1/(4*(xc-xe)^2/(za+zb)^2-1/(nc+0.5)) - 0.5 > + return(ans) > + } >> ## needed for method 2 >> nef<-function(rc,nc,re,ne,alpha,power){ > + za<-qnorm(1-alpha) > + zb<-qnorm(power) > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<-(za*sqrt(1+(ne+0.5)/(nc+0.5))+zb)^2/(2*(xe-xc))^2-0.5 > + return(ans) > + } >> ## needed for method 3 >> c.searchd<-function(nc, d, ne, alpha=0.05, power=0.8, cc=c(0.1,2),tol1=0.0001){ > + #--------------------------- > + # nc sample size of control group > + # d the differece to detect between control and experiment > + # ne vector of starting sample size of experiment group > + # corresonding to rc of 0 to nc*(1-d) > + # alpha size of test > + # power target power > + # cc pre-screen vector of constant c, the range should cover the > + # the value of cc that has expected power > + # tol1 the allowance between the expceted power and target power > + #--------------------------- > + pc<-(0:((1-d)*nc))/nc > + ncl<-length(pc) > + ne.old<-ne > + ne.old1<-ne.old > + ### sweeping forward > + for(i in 1:ncl){ > + cmin<-cc[1] > + cmax<-cc[2] > + ### fixed cci<-cmax bug > + cci<-1 > + lhood<-dbinom((i:ncl)-1,nc,pc[i]) > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + while(abs(Ep0[i]-power)>tol1){ > + if(Ep0[i] + else cmax<-cci > + cci<-(cmax+cmin)/2 > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + } > + ne.old1<-ne > + } > + ne1<-ne > + ### sweeping backward -- ncl:i > + ne.old2<-ne.old > + ne<-ne.old > + for(i in ncl:1){ > + cmin<-cc[1] > + cmax<-cc[2] > + ### fixed cci<-cmax bug > + cci<-1 > + lhood<-dbinom((ncl:i)-1,nc,pc[i]) > + lenl<-length(lhood) > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > + Ep0<-Epower(nc, d, cci*ne, pc, alpha) > + while(abs(Ep0[i]-power)>tol1){ > + if(Ep0[i] + else cmax<-cci > + cci<-(cmax+cmin)/2 > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + } > + ne.old2<-ne > + } > + ne2<-ne > + ne<-(ne1+ne2)/2 > + #cat(ccc*ne) > + Ep1<-Epower(nc, d, ne, pc, alpha) > + return(ne=ne, Ep=Ep1) > + } >> ### >> vertex<-function(x,y) > + { n<-length(x) > + vx<-x[1] > + vy<-y[1] > + vp<-1 > + up<-T > + for (i in (2:n)) > + { if (up) > + { if (y[i-1]> y[i]) > + {vx<-c(vx,x[i-1]) > + vy<-c(vy,y[i-1]) > + vp<-c(vp,i-1) > + up<-F > + } > + } > + else > + { if (y[i-1]< y[i]) up<-T > + } > + } > + vx<-c(vx,x[n]) > + vy<-c(vy,y[n]) > + vp<-c(vp,n) > + return(vx=vx,vy=vy,vp=vp) > + } >> ### >> convex<-function(x,y) > + { > + n<-length(x) > + ans<-vertex(x,y) > + len<-length(ans$vx) > + while (len>3) > + { > + #cat("x=",x,"\n") > + #cat("y=",y,"\n") > + newx<-x[1:(ans$vp[2]-1)] > + newy<-y[1:(ans$vp[2]-1)] > + for (i in (2:(len-1))) > + { > + newx<-c(newx,x[ans$vp[i]]) > + newy<-c(newy,y[ans$vp[i]]) > + } > + newx<-c(newx,x[(ans$vp[len-1]+1):n]) > + newy<-c(newy,y[(ans$vp[len-1]+1):n]) > + y<-approx(newx,newy,xout=x)$y > + #cat("new y=",y,"\n") > + ans<-vertex(x,y) > + len<-length(ans$vx) > + #cat("vx=",ans$vx,"\n") > + #cat("vy=",ans$vy,"\n") > + } > + return(wx=x,wy=y)} >> ### >> Epower<-function(nc, d, ne, pc = (0:((1 - d) * nc))/nc, alpha = 0.05) > + { > + #------------------------------------- > + # nc sample size in historical control > + # d the increase of response rate between historical and experiment > + # ne sample size of corresonding rc of 0 to nc*(1-d) > + # pc the response rate of control group, where we compute the > + # expected power > + # alpha the size of test > + #------------------------------------- > + kk<- length(pc) > + rc<- 0:(nc * (1 - d)) > + pp<- rep(NA, kk) > + ppp<- rep(NA, kk) > + for(i in 1:(kk)) { > + pe<- pc[i] + d > + lhood<- dbinom(rc, nc, pc[i]) > + pp<- power1.f(rc, nc, ne, pe, alpha) > + ppp[i]<- sum(pp * lhood)/sum(lhood) > + } > + return(ppp) > + } >> >> # adapted from the old biss2 >> ss.rand<-function(rc,nc,d,alpha=.05,power=.8,tol=.01) > + { > + ne<-nc > + ne1<-nc+50 > + while(abs(ne-ne1)>tol& ne1<100000){ > + ne<-ne1 > + pe<-d+rc/nc > + ne1<-nef2(rc,nc,pe*ne,ne,alpha,power) > + > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > + } > + if (ne1>100000) return(NA) > + else return(ne1) > + } >> ### >> power1.f<-function(rc,nc,ne,pie,alpha=0.05){ > + #------------------------------------- > + # rcnumber of response in historical control > + # ncsample size in historical control > + # ne sample size in experitment group > + # pietrue response rate for experiment group > + # alphasize of the test > + #------------------------------------- > + > + za<-qnorm(1-alpha) > + re<-ne*pie > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<-za*sqrt(1+(ne+0.5)/(nc+0.5))-(xe-xc)/sqrt(1/(4*(ne+0.5))) > + return(1-pnorm(ans)) > + } > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pburns at pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') From paul.hiemstra at knmi.nl Wed Oct 5 10:17:37 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Wed, 05 Oct 2011 08:17:37 +0000 Subject: [R] Problems loading package hydroTSM In-Reply-To: <036801cc82e4$1ef26c20$5cd74460$@gmail.com> References: <036801cc82e4$1ef26c20$5cd74460$@gmail.com> Message-ID: <4E8C12A1.7040803@knmi.nl> Hi, What happens when you update R under win7 to 2.12.1? And take a look at the posting guide [1] for tips on what kind of information you need to provide. Especially a sessionInfo() under both Mac and Windows would be useful.... cheers, Paul [1] http://www.R-project.org/posting-guide.html On 10/04/2011 10:22 PM, Eduardo M. A. M.Mendes wrote: > Hello > > > > I have the following problem when loading the package hydroGOF on Windows 7 > running R.12.2 > > > > library(hydroGOF) > > Error : package 'hydroTSM' does not have a name space > > Error: package/namespace load failed for 'hydroGOF' > > > > The same command does not result in error on R.13.1 at my Mac running Lion. > > > > Have I done something wrong? > > > > Many thanks > > > > Ed > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From b.rowlingson at lancaster.ac.uk Wed Oct 5 11:02:52 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Wed, 5 Oct 2011 10:02:52 +0100 Subject: [R] SPlus to R In-Reply-To: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: On Wed, Oct 5, 2011 at 2:53 AM, Scott Raynaud wrote: > I'm trying to convert an S-Plus program to R.? Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help.? All I did was convert the underscores in S-Plus to the assignment operator <-.? Here are the first few lines of the S-Plus file: > > sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, > ???????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > return(ne=ne1) > ?????????????? } > > > My?translation looks like this: > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > ????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ?ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > ?return(ne=ne1) > ?????????????? } > > The program runs without throwing errors, but I'm not getting any ourput in the console.? This is where it should be, right?? I think I have this set up correctly.? I'm using method=3 which only requires nc and d to be specified.? Any ideas why I'm not seeing output? Long shot: the code you posted looked like (and hard to tell without indentation) just a bunch of function definitions. R won't actually do anything unless you call those functions with some parameters. So, when you say you get no output when you 'run' the code, what exactly do you mean by 'run' the code? What I would do is: 1. Put the code in a file called 'whatever.R'. 2. Start R, and do source("whatever.R"). That defines the functions. do "ls()" and you should see them. 3. Call one of the functions: sshc(100,10) I'd call that, in R terms, "calling the sshc function" rather than running anything. Barry From paul.hiemstra at knmi.nl Wed Oct 5 11:03:53 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Wed, 05 Oct 2011 09:03:53 +0000 Subject: [R] reporting multiple objects out of a function In-Reply-To: <1317788874982-3873380.post@n4.nabble.com> References: <1317788874982-3873380.post@n4.nabble.com> Message-ID: <4E8C1D79.8080205@knmi.nl> On 10/05/2011 04:27 AM, andrewH wrote: > Dear folks, > > I?m trying to build a function to create and make available some variables I > frequently use for testing purposes. Suppose I have a function that takes > some inputs and creates (internally) several named objects. Say, > > fun1 <- function(x, y, z) {obj1 <- x; obj2 <- y; obj3 <- z > > } > > Here is the challenge: After I run it, I want the objects to be available in > the calling environment, but not necessarily in the global environment. I > want them to be individually available, not as part of a list or some larger > object. I can not figure out how to do this. If I understand the situation > correctly, I am trying to move several separate objects from the environment > of the function to the environment in which the function was invoked (the > ?calling environment,? yes?). > > I?m pretty sure there is a command to do this, but I?m not sure how to find > it. Any help would be greatly appreciated ? either on the necessary code, or > on how to search for it, or a reference to a good discussion of this family > of problems. > > Sincerely, andrewH > > > -- > View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3873380.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hi, You can use the <<- operator. Although in the spirit of fortune(106), "If <<- is the answer, rethink your question...". Why is it important that your objects are not part of a larger object, which is standard practice in R. You could also take a look at the attach() command: bla = fun1(x,y,z) attach(bla) Note that in fun1 you do need to return your objects in a list. However, I'm still not in favor of using this approach... good luck, Paul -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From mentor_ at gmx.net Wed Oct 5 11:27:30 2011 From: mentor_ at gmx.net (syrvn) Date: Wed, 5 Oct 2011 02:27:30 -0700 (PDT) Subject: [R] texi2dvi problem when compiling incorrect Latex code In-Reply-To: References: <1317732235196-3870827.post@n4.nabble.com> Message-ID: <1317806850526-3873909.post@n4.nabble.com> Hi Ista, thanks for you reply. If I understod correctly you run your R within Eclipse but as the Lunch Type you use Rterm rather than RJ. I changed my configuration so that R is now lunched as Rterm and NOT as RJ and I also removed the quite=FALSE from my configuration. Unfortunately, I still have the same problem. To create an error in my latex code I just typed the following: \asdasd in one of my .tex files. When I compile my document by hand using the Mac OS X / UNIX terminal I get the following latex compiling output: Underfull \hbox (badness 10000) in paragraph at lines 4--7 ) (../abstract.tex Underfull \hbox (badness 10000) in paragraph at lines 1--12 ../abstract.tex:15: Undefined control sequence. l.15 \asdasdas ? It stops at the question mark and waits for user input. If I press enter it continues and finally stops with an error which is fine. The only problem is that if I do it in R the console does not print everything until the question mark and therefore I cannot just press enter to let latex finish compiling the code. I don't know how to get around this. Best syrvn -- View this message in context: http://r.789695.n4.nabble.com/texi2dvi-problem-when-compiling-incorrect-Latex-code-tp3870827p3873909.html Sent from the R help mailing list archive at Nabble.com. From rubenbauar at gmx.de Wed Oct 5 11:49:33 2011 From: rubenbauar at gmx.de (Chris82) Date: Wed, 5 Oct 2011 02:49:33 -0700 (PDT) Subject: [R] optimize R code: replace for loop Message-ID: <1317808173267-3873945.post@n4.nabble.com> Dear R Users, at the moment I am trying to optimize an R script. testvec <- c(0,1,0,1,1,1,1,0,0,1,0,1,0) sum.testvec <- vector() tempsum <- 1 for (e in 1:length(testvec)){ sum.testvec[e] <- tempsum+testvec[e] tempsum <- sum.testvec[e] } final.sum <- c(1,sum.testvec) Is there an option to do something with apply? Unfortunately I am not so familiar with the apply functions. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/optimize-R-code-replace-for-loop-tp3873945p3873945.html Sent from the R help mailing list archive at Nabble.com. From Thierry.ONKELINX at inbo.be Wed Oct 5 11:54:34 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Wed, 5 Oct 2011 09:54:34 +0000 Subject: [R] optimize R code: replace for loop In-Reply-To: <1317808173267-3873945.post@n4.nabble.com> References: <1317808173267-3873945.post@n4.nabble.com> Message-ID: You can vectorize it using cumsum. cumsum(c(1, testvec)) all.equal(final.sum, cumsum(c(1, testvec))) > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Chris82 > Verzonden: woensdag 5 oktober 2011 11:50 > Aan: r-help at r-project.org > Onderwerp: [R] optimize R code: replace for loop > > Dear R Users, > > at the moment I am trying to optimize an R script. > > testvec <- c(0,1,0,1,1,1,1,0,0,1,0,1,0) > > > sum.testvec <- vector() > tempsum <- 1 > for (e in 1:length(testvec)){ > sum.testvec[e] <- tempsum+testvec[e] > tempsum <- sum.testvec[e] > > } > > final.sum <- c(1,sum.testvec) > > > Is there an option to do something with apply? Unfortunately I am not so > familiar with the apply functions. > > Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/optimize-R-code- > replace-for-loop-tp3873945p3873945.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From renaud at mancala.cbio.uct.ac.za Wed Oct 5 12:13:44 2011 From: renaud at mancala.cbio.uct.ac.za (Renaud Gaujoux) Date: Wed, 05 Oct 2011 12:13:44 +0200 Subject: [R] Behaviour of 'source' with URLs and proxy Message-ID: <4E8C2DD8.9050202@cbio.uct.ac.za> Hi, I am having troubles sourcing a file from our local network from R. It looks like this file are not properly accessed by 'source', even they can be downloaded with download.file. (See below my settings and some tests I did). I ended up with a work around, but I would like to understand what is going on. Doesn't source/readLines uses the same mechanism as download.file to access URLs? Thank you. Renaud My setting: - I am using R 2.13.2 on Ubuntu 11.04. - I am accessing internet through a proxy (set up with cntlm, not sure if this is the issue but I don't know how to check without it). This means that http_proxy='http://localhost:8080/'. - We have local CRNA/BioConductor mirrors that can be accessed without going through the proxy. - My .Rprofile sources a file 'setrepos.R' on the local network, that sets all relevant repos to our local mirrors. From the shell: - I can wget any URL (local or internet) from command line without a problem. - In particular I can wget the file 'setrepos.R' from command line. Symptoms: - with options(download.file.method='wget'), I can download any URL (local or internet) with download.file - I _cannot_ source any local or internet URL if http_proxy is set. It simply freezes. Using internet.info=0 gives the following messages: ############ Warning messages: 1: In file(file, "r", encoding = encoding) : using HTTP proxy 'http://localhost:8080/' 2: In file(file, "r", encoding = encoding) : connected to 'localhost' on port 8080. 3: In file(file, "r", encoding = encoding) : -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 Host: *OUR_HOST* Pragma: no-cache User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: Keep-Alive 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 9: In file(file, "r", encoding = encoding) : <- Date: Wed, 05 Oct 2011 06:43:13 GMT 10: In file(file, "r", encoding = encoding) : <- Content-Type: text/plain 11: In file(file, "r", encoding = encoding) : <- ETag: "30b8018-63d-4a627b821c980" 12: In file(file, "r", encoding = encoding) : <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes 14: In file(file, "r", encoding = encoding) : <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT 15: In file(file, "r", encoding = encoding) : Code 200, content-type 'text/plain' ############ - Setting options(download.file.method='wget') before sourcing does not change the behaviour. - However, I can source any local URL if http_proxy='', without changing download.file.method. But then download.file does not work for internet URL any more since the proxy settings are wrong. I could set http_proxy='', then source, then restore the proxy settings and set options(download.file.method='wget'). But this is just a work around and I would like to understand what is going on. Session Info: R version 2.13.2 (2011-09-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] devtools_0.4 loaded via a namespace (and not attached): [1] RCurl_1.6-10 tools_2.13.2 -- Renaud Gaujoux Computational Biology - University of Cape Town South Africa ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}} From jim at bitwrit.com.au Wed Oct 5 12:13:40 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Wed, 05 Oct 2011 21:13:40 +1100 Subject: [R] break.axis all range of data In-Reply-To: References: Message-ID: <4E8C2DD4.3070407@bitwrit.com.au> On 10/05/2011 09:52 AM, Heverkuhn Heverkuhn wrote: > Hello R users, > > I have a plot type=b with x axis at=(1:36), > I would like to increase the distance between x tick-marks 8 and 9, and not > connect the points x=8 and x=9. > I can do the second thing, setting type="p" and then drawing the lines, but > I don't know how to do the first. > > Plus, I was wondering if there was a function that allows to insert a gap > without covering data point like break.axis does. > Hi Heverkuhn, The axis.break function (plotrix) allows the user to insert a break mark on an axis with three styles (zigzag, slash and gap). From your question, I think you want one of the first two styles. Jim From brian at braverock.com Wed Oct 5 12:21:34 2011 From: brian at braverock.com (Brian G. Peterson) Date: Wed, 05 Oct 2011 05:21:34 -0500 Subject: [R] [R-SIG-Finance] AsOf join in R In-Reply-To: References: Message-ID: <1317810094.4192.316.camel@brian-desktop> On Tue, 2011-10-04 at 23:41 -0400, Robert A'gata wrote: > AsOf(A,B) should return > > A B > 2011-09-01 10 1.1 > 2011-09-09 15 1.1 # (because latest value B prior to > 2011-09-09 is 1.1) > 2011-09-10 20 1.5 > 2011-09-15 25 1.7 > > How do I write the above AsOf function in R? The merge function does > not do what I want because it will align points that have the same > time stamp together while what I want is actually latest value prior > to timestamp in A. Any example would be greatly appreciated. Thank > you. A <- xts(c(10,15,20,25), order.by=as.POSIXct(c("2011-09-01","2011-09-09","2011-09-10","2011-09-15"))) B <- xts(c(1.1,1.5,1.3,1.7), order.by=as.POSIXct(c("2011-08-31","2011-09-09","2011-09-11","2011-09-12"))) AsOf<-function(a,b) { x<-cbind(a,b) x[,2]<-na.locf(x[,2]) x[!is.na(x[,1])] } AsOf(A,B) ##################### # ..1 ..2 #2011-09-01 10 1.1 #2011-09-09 15 1.5 #2011-09-10 20 1.5 #2011-09-15 25 1.7 ##################### -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock From ripley at stats.ox.ac.uk Wed Oct 5 12:26:37 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 5 Oct 2011 11:26:37 +0100 (BST) Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: <4E8C2DD8.9050202@cbio.uct.ac.za> References: <4E8C2DD8.9050202@cbio.uct.ac.za> Message-ID: On Wed, 5 Oct 2011, Renaud Gaujoux wrote: > Hi, > > I am having troubles sourcing a file from our local network from R. > It looks like this file are not properly accessed by 'source', even they can > be downloaded with download.file. (See below my settings and some tests I > did). I ended up with a work around, but I would like to understand what is > going on. > > Doesn't source/readLines uses the same mechanism as download.file to access > URLs? No. They use url() connections. See ?file. > > Thank you. > > Renaud > > My setting: > - I am using R 2.13.2 on Ubuntu 11.04. > - I am accessing internet through a proxy (set up with cntlm, not sure if > this is the issue but I don't know how to check without it). This means that > http_proxy='http://localhost:8080/'. > - We have local CRNA/BioConductor mirrors that can be accessed without going > through the proxy. > - My .Rprofile sources a file 'setrepos.R' on the local network, that sets > all relevant repos to our local mirrors. > > From the shell: > - I can wget any URL (local or internet) from command line without a problem. > - In particular I can wget the file 'setrepos.R' from command line. > > Symptoms: > - with options(download.file.method='wget'), I can download any URL (local or > internet) with download.file > - I _cannot_ source any local or internet URL if http_proxy is set. It simply > freezes. Using internet.info=0 gives the following messages: > ############ > Warning messages: > 1: In file(file, "r", encoding = encoding) : > using HTTP proxy 'http://localhost:8080/' > 2: In file(file, "r", encoding = encoding) : > connected to 'localhost' on port 8080. > 3: In file(file, "r", encoding = encoding) : > -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 > Host: *OUR_HOST* > Pragma: no-cache > User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) > > 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK > 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004 > 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive > 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: Keep-Alive > 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 > 9: In file(file, "r", encoding = encoding) : > <- Date: Wed, 05 Oct 2011 06:43:13 GMT > 10: In file(file, "r", encoding = encoding) : <- Content-Type: text/plain > 11: In file(file, "r", encoding = encoding) : > <- ETag: "30b8018-63d-4a627b821c980" > 12: In file(file, "r", encoding = encoding) : > <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 with > Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g > mod_perl/2.0.4 Perl/v5.10.0 > 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes > 14: In file(file, "r", encoding = encoding) : > <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT > 15: In file(file, "r", encoding = encoding) : Code 200, content-type > 'text/plain' > ############ > > - Setting options(download.file.method='wget') before sourcing does not > change the behaviour. > - However, I can source any local URL if http_proxy='', without changing > download.file.method. But then download.file does not work for internet URL > any more since the proxy settings are wrong. I could set http_proxy='', then > source, then restore the proxy settings and set > options(download.file.method='wget'). But this is just a work around and I > would like to understand what is going on. > > Session Info: > > R version 2.13.2 (2011-09-30) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] devtools_0.4 > > loaded via a namespace (and not attached): > [1] RCurl_1.6-10 tools_2.13.2 > > > > > -- > > Renaud Gaujoux > Computational Biology - University of Cape Town > South Africa > > > > > ### > > UNIVERSITY OF CAPE TOWN > This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From ripley at stats.ox.ac.uk Wed Oct 5 12:29:16 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 5 Oct 2011 11:29:16 +0100 (BST) Subject: [R] Strange error msg when plotting a graphics In-Reply-To: References: Message-ID: On Tue, 4 Oct 2011, Eduardo M. A. M. Mendes wrote: > Dear R-Users > > I have come across the error that apparently has nothing to do with > command itself. Here is the error > > (w - matrix (or vector) e testXaxis - dates). > >> plot(data.frame(testXaxis,w),col="blue",ylab="Q, [m3/s]",xlab="Data", > + main="Free-run - Modelo NARX MISO - Test Data") > Error in gzfile(file, "wb") : cannot open the connection > In addition: Warning message: > In gzfile(file, "wb") : > cannot open compressed file '/Users/eduardo/.rstudio-desktop/graphics/0540fa66-727a-4f8f-a1b5-f6501e96a393.snapshot', probable reason 'No such file or directory' > Graphics error: Error in gzfile(file, "wb") : cannot open the connection > > The error did not stop the graphics of being plotted. > > What could be wrong? Using R-help to report a problem in Rstudio? When reporting graphics problems, you need to report the graphics device you use, as well as the 'at a minimum' information required by the posting guide. This looks like a problem in Rstudio's own device/UI, but I am guessing from the file name. If you use an alternative front-end such as Rstudio or Tinn-R (or even RKWard) please do mention it in your posting. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From renaud at mancala.cbio.uct.ac.za Wed Oct 5 12:44:12 2011 From: renaud at mancala.cbio.uct.ac.za (Renaud Gaujoux) Date: Wed, 05 Oct 2011 12:44:12 +0200 Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: References: <4E8C2DD8.9050202@cbio.uct.ac.za> Message-ID: <4E8C34FC.20804@cbio.uct.ac.za> From the help page ?file I -- had -- read the following: "For ?url? the description is a complete URL, including scheme (such as ?http://?, ?ftp://? or ?file://?). Proxies can be specified for HTTP and FTP ?url? connections: see ?download.file?." From the internet.info messages it seems that the proxy is actually used, but somehow differently than what download.file does (via wget). Is source supposed to work through a proxy? -- Renaud Gaujoux Computational Biology - University of Cape Town South Africa On 05/10/2011 12:26, Prof Brian Ripley wrote: > On Wed, 5 Oct 2011, Renaud Gaujoux wrote: > >> Hi, >> >> I am having troubles sourcing a file from our local network from R. >> It looks like this file are not properly accessed by 'source', even >> they can be downloaded with download.file. (See below my settings and >> some tests I did). I ended up with a work around, but I would like to >> understand what is going on. >> >> Doesn't source/readLines uses the same mechanism as download.file to >> access URLs? > > No. They use url() connections. See ?file. > >> >> Thank you. >> >> Renaud >> >> My setting: >> - I am using R 2.13.2 on Ubuntu 11.04. >> - I am accessing internet through a proxy (set up with cntlm, not >> sure if this is the issue but I don't know how to check without it). >> This means that http_proxy='http://localhost:8080/'. >> - We have local CRNA/BioConductor mirrors that can be accessed >> without going through the proxy. >> - My .Rprofile sources a file 'setrepos.R' on the local network, that >> sets all relevant repos to our local mirrors. >> >> From the shell: >> - I can wget any URL (local or internet) from command line without a >> problem. >> - In particular I can wget the file 'setrepos.R' from command line. >> >> Symptoms: >> - with options(download.file.method='wget'), I can download any URL >> (local or internet) with download.file >> - I _cannot_ source any local or internet URL if http_proxy is set. >> It simply freezes. Using internet.info=0 gives the following messages: >> ############ >> Warning messages: >> 1: In file(file, "r", encoding = encoding) : >> using HTTP proxy 'http://localhost:8080/' >> 2: In file(file, "r", encoding = encoding) : >> connected to 'localhost' on port 8080. >> 3: In file(file, "r", encoding = encoding) : >> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 >> Host: *OUR_HOST* >> Pragma: no-cache >> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >> >> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK >> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004 >> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive >> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: >> Keep-Alive >> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 >> 9: In file(file, "r", encoding = encoding) : >> <- Date: Wed, 05 Oct 2011 06:43:13 GMT >> 10: In file(file, "r", encoding = encoding) : <- Content-Type: >> text/plain >> 11: In file(file, "r", encoding = encoding) : >> <- ETag: "30b8018-63d-4a627b821c980" >> 12: In file(file, "r", encoding = encoding) : >> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 >> with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 >> OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0 >> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes >> 14: In file(file, "r", encoding = encoding) : >> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT >> 15: In file(file, "r", encoding = encoding) : Code 200, content-type >> 'text/plain' >> ############ >> >> - Setting options(download.file.method='wget') before sourcing does >> not change the behaviour. >> - However, I can source any local URL if http_proxy='', without >> changing download.file.method. But then download.file does not work >> for internet URL any more since the proxy settings are wrong. I could >> set http_proxy='', then source, then restore the proxy settings and >> set options(download.file.method='wget'). But this is just a work >> around and I would like to understand what is going on. >> >> Session Info: >> >> R version 2.13.2 (2011-09-30) >> Platform: x86_64-pc-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] devtools_0.4 >> >> loaded via a namespace (and not attached): >> [1] RCurl_1.6-10 tools_2.13.2 >> >> >> >> >> -- >> >> Renaud Gaujoux >> Computational Biology - University of Cape Town >> South Africa >> >> >> >> >> ### >> >> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT >> policies and e-mai...{{dropped:5}} >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}} From hans at sociologi.cjb.net Wed Oct 5 13:13:30 2011 From: hans at sociologi.cjb.net (Hans Ekbrand) Date: Wed, 5 Oct 2011 13:13:30 +0200 Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: <4E8C34FC.20804@cbio.uct.ac.za> References: <4E8C2DD8.9050202@cbio.uct.ac.za> <4E8C34FC.20804@cbio.uct.ac.za> Message-ID: <20111005111329.GA11617@ingegerdsdator> On Wed, Oct 05, 2011 at 12:44:12PM +0200, Renaud Gaujoux wrote: > Is source supposed to work through a proxy? This worked for me: > Sys.setenv(http_proxy="http://192.168.0.252:8118") > source("http://pc5.socio.gu.se:84/enkel-kurva.r", echo = T) > my.vectory = c(1,30,2,3,3,4) > my.vectorx = c(1,2,3,4,5,6) > plot(y = my.vectory, x = my.vectorx, type = "l") From johannes_graumann at web.de Wed Oct 5 13:29:15 2011 From: johannes_graumann at web.de (Johannes Graumann) Date: Wed, 5 Oct 2011 14:29:15 +0300 Subject: [R] Vector-subsetting with ZERO - Is behavior changeable? Message-ID: Dear All, I have trouble generizising some code. > index <- 0 > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}) Will yield a wished for vector like so: [1] 3 2 1 But in this case (trying to select te second to last element in each vector of the list) > index <- 1 > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}) I end up with [[1]] [1] 2 [[2]] [1] 1 [[3]] numeric(0) I would (massively) prefer something like [1] 2 1 NA My current implementation looks like > index <- 1 > unlist( > sapply( > list(c(1,2,3),c(1,2),c(1)), > function(x){ > value <- x[max(length(x)-index,0)] > if(identical(value,numeric(0))){return(NA)} else {return(value)} > } > ) > ) [1] 2 1 NA Quite the inelegant eyesore. Any hints on how to do this better? Thanks, Joh From mayer at psychologie.tu-dresden.de Wed Oct 5 13:30:17 2011 From: mayer at psychologie.tu-dresden.de (=?iso-8859-1?b?UmVu6Q==?= Mayer) Date: Wed, 05 Oct 2011 13:30:17 +0200 Subject: [R] lattice-dotplot: resize axis Message-ID: <20111005133017.13042hig8pufygnd@psy2.psych.tu-dresden.de> dear all, I want to make a dotplot with ratings from Items in 6 ItemsGroups. I reordered the items by rating within each group. I plotted the items by rating conditional on ItemGroup. The ordering works as I wanted but my y-aches labels (items) within each ItemGroup are now unequally spaced, e.g., in some panels there is a gap between one lower rated item and the next higher, to give a picture items=a,e,f,g ItemGroup=n ----------------- g| . f| . e| . | | | a| . ----------------- How can I correct this? What have I overlooked? # code i've used (from latticeExtra/utilities/resize panels) library(latticeExtra) mean.ratings$item.name <- with(mean.ratings, reorder(reorder(item, rating), as.numeric(ItemGroup))) dpratings <- dotplot(item.name ~ rating | reorder(ItemGroup, rating), data = mean.ratings, layout = c(1, 6), xlim=c(1,6), aspect = .1, scales = list(y = list(relation = "free", cex=.5))) ## approximate resizePanels(dpratings, h = with(mean.ratings, table(reorder(ItemGroup, rating)))) thanks, Ren? From ripley at stats.ox.ac.uk Wed Oct 5 13:45:58 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 5 Oct 2011 12:45:58 +0100 (BST) Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: <4E8C34FC.20804@cbio.uct.ac.za> References: <4E8C2DD8.9050202@cbio.uct.ac.za> <4E8C34FC.20804@cbio.uct.ac.za> Message-ID: On Wed, 5 Oct 2011, Renaud Gaujoux wrote: > From the help page ?file I -- had -- read the following: > > "For ?url? the description is a complete URL, including scheme > (such as ?http://?, ?ftp://? or ?file://?). Proxies can be > specified for HTTP and FTP ?url? connections: see ?download.file?." So you should have known that it was the same as url()! > From the internet.info messages it seems that the proxy is actually used, but > somehow differently than what download.file does (via wget). No, somewhat differently than *wget* does. As that help page says, the section on proxies only refers to the internal method. > Is source supposed to work through a proxy? Yes, and it has been tested to do so. But not tested on your proxy .... > > -- > Renaud Gaujoux > Computational Biology - University of Cape Town > South Africa > > > On 05/10/2011 12:26, Prof Brian Ripley wrote: >> On Wed, 5 Oct 2011, Renaud Gaujoux wrote: >> >>> Hi, >>> >>> I am having troubles sourcing a file from our local network from R. >>> It looks like this file are not properly accessed by 'source', even they >>> can be downloaded with download.file. (See below my settings and some >>> tests I did). I ended up with a work around, but I would like to >>> understand what is going on. >>> >>> Doesn't source/readLines uses the same mechanism as download.file to >>> access URLs? >> >> No. They use url() connections. See ?file. >> >>> >>> Thank you. >>> >>> Renaud >>> >>> My setting: >>> - I am using R 2.13.2 on Ubuntu 11.04. >>> - I am accessing internet through a proxy (set up with cntlm, not sure if >>> this is the issue but I don't know how to check without it). This means >>> that http_proxy='http://localhost:8080/'. >>> - We have local CRNA/BioConductor mirrors that can be accessed without >>> going through the proxy. >>> - My .Rprofile sources a file 'setrepos.R' on the local network, that sets >>> all relevant repos to our local mirrors. >>> >>> From the shell: >>> - I can wget any URL (local or internet) from command line without a >>> problem. >>> - In particular I can wget the file 'setrepos.R' from command line. >>> >>> Symptoms: >>> - with options(download.file.method='wget'), I can download any URL (local >>> or internet) with download.file >>> - I _cannot_ source any local or internet URL if http_proxy is set. It >>> simply freezes. Using internet.info=0 gives the following messages: >>> ############ >>> Warning messages: >>> 1: In file(file, "r", encoding = encoding) : >>> using HTTP proxy 'http://localhost:8080/' >>> 2: In file(file, "r", encoding = encoding) : >>> connected to 'localhost' on port 8080. >>> 3: In file(file, "r", encoding = encoding) : >>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 >>> Host: *OUR_HOST* >>> Pragma: no-cache >>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >>> >>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK >>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004 >>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive >>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: >>> Keep-Alive >>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 >>> 9: In file(file, "r", encoding = encoding) : >>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT >>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: text/plain >>> 11: In file(file, "r", encoding = encoding) : >>> <- ETag: "30b8018-63d-4a627b821c980" >>> 12: In file(file, "r", encoding = encoding) : >>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 with >>> Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g >>> mod_perl/2.0.4 Perl/v5.10.0 >>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes >>> 14: In file(file, "r", encoding = encoding) : >>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT >>> 15: In file(file, "r", encoding = encoding) : Code 200, content-type >>> 'text/plain' >>> ############ >>> >>> - Setting options(download.file.method='wget') before sourcing does not >>> change the behaviour. >>> - However, I can source any local URL if http_proxy='', without changing >>> download.file.method. But then download.file does not work for internet >>> URL any more since the proxy settings are wrong. I could set >>> http_proxy='', then source, then restore the proxy settings and set >>> options(download.file.method='wget'). But this is just a work around and I >>> would like to understand what is going on. >>> >>> Session Info: >>> >>> R version 2.13.2 (2011-09-30) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] devtools_0.4 >>> >>> loaded via a namespace (and not attached): >>> [1] RCurl_1.6-10 tools_2.13.2 >>> >>> >>> >>> >>> -- >>> >>> Renaud Gaujoux >>> Computational Biology - University of Cape Town >>> South Africa >>> >>> >>> >>> >>> ### >>> >>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and >>> e-mai...{{dropped:5}} >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > > > ### > > UNIVERSITY OF CAPE TOWN > This e-mail is subject to the UCT ICT policies and e-mail disclaimer > published on our website at > http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 > 21 650 9111. This e-mail is intended only for the person(s) to whom it is > addressed. If the e-mail has reached you in error, please notify the author. > If you are not the intended recipient of the e-mail you may not use, > disclose, copy, redirect or print the content. If this e-mail is not related > to the business of UCT it is sent by the sender in the sender's individual > capacity. > > ### > > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From bt_jannis at yahoo.de Wed Oct 5 13:56:52 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 5 Oct 2011 12:56:52 +0100 Subject: [R] help with regexp Message-ID: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> Dear list memebers, I am stuck with using regular expressions. Imagine I have a vector of character strings like: test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? Some try from my side yielded no results: testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? Thanks for any help Jannis From fomcl at yahoo.com Wed Oct 5 14:37:14 2011 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Wed, 5 Oct 2011 05:37:14 -0700 Subject: [R] help with regexp In-Reply-To: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> References: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> Message-ID: <1317818234.79647.YahooMailNeo@web110716.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From renaud at mancala.cbio.uct.ac.za Wed Oct 5 14:46:13 2011 From: renaud at mancala.cbio.uct.ac.za (Renaud Gaujoux) Date: Wed, 05 Oct 2011 14:46:13 +0200 Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: References: <4E8C2DD8.9050202@cbio.uct.ac.za> <4E8C34FC.20804@cbio.uct.ac.za> Message-ID: <4E8C5195.1040602@cbio.uct.ac.za> On 05/10/2011 13:45, Prof Brian Ripley wrote: > On Wed, 5 Oct 2011, Renaud Gaujoux wrote: > >> From the help page ?file I -- had -- read the following: >> >> "For ?url? the description is a complete URL, including scheme >> (such as ?http://?, ?ftp://? or ?file://?). Proxies can be >> specified for HTTP and FTP ?url? connections: see ?download.file?." > > So you should have known that it was the same as url()! I agree. I just thought -- incorrectly -- that any attempt to download a file from R would eventually call the same C code as download.file. Or maybe source() does not download and source, but reads the file on the fly? > >> From the internet.info messages it seems that the proxy is actually >> used, but somehow differently than what download.file does (via wget). > > No, somewhat differently than *wget* does. As that help page says, > the section on proxies only refers to the internal method. > >> Is source supposed to work through a proxy? > > Yes, and it has been tested to do so. But not tested on your proxy .... OK, I agree that my settings look special, but in the end it is supposed to be a plain local proxy with no authentication. The proxy is effectively used by the internal method and, from the messages (below), the remote file is opened, http headers are returned, but nothing else happens and I have to cancel the command (Ctrl-C). This is where I would like to have some input, so that I can work out the issue. I tried to go through the C code for internet with no great luck: seems that in_R_HTTPRead and RxmlNanoHTTPRead would the place to look at. Any idea on what would cause these functions to hang (infinite loop, communication problem, ...)? I know, I am too curious. Thank you > Sys.getenv('http_proxy') [1] "http://localhost:8080/" > Sys.getenv('no_proxy') [1] "localhost,127.0.0.0/8,*.local" > options(internet.info=0) > download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt") trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt' Content type 'text/plain' length 1209 bytes opened URL ^C There were 15 warnings (use warnings() to see them) > warnings() Warning messages: 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : connected to 'localhost' on port 8080. 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0 Host: lib.stat.cmu.edu User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- HTTP/1.1 200 OK 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Connection: Keep-Alive 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Proxy-Connection: Keep-Alive 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Content-Length: 1209 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Age: 747 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Date: Wed, 05 Oct 2011 11:52:25 GMT 10: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Content-Type: text/plain 11: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- ETag: "5c700f3-4b9-399383c0" 12: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Server: Apache 13: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Accept-Ranges: bytes 14: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT 15: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : Code 200, content-type 'text/plain' > > >> >> -- >> Renaud Gaujoux >> Computational Biology - University of Cape Town >> South Africa >> >> >> On 05/10/2011 12:26, Prof Brian Ripley wrote: >>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote: >>> >>>> Hi, >>>> >>>> I am having troubles sourcing a file from our local network from R. >>>> It looks like this file are not properly accessed by 'source', even >>>> they can be downloaded with download.file. (See below my settings >>>> and some tests I did). I ended up with a work around, but I would >>>> like to understand what is going on. >>>> >>>> Doesn't source/readLines uses the same mechanism as download.file >>>> to access URLs? >>> >>> No. They use url() connections. See ?file. >>> >>>> >>>> Thank you. >>>> >>>> Renaud >>>> >>>> My setting: >>>> - I am using R 2.13.2 on Ubuntu 11.04. >>>> - I am accessing internet through a proxy (set up with cntlm, not >>>> sure if this is the issue but I don't know how to check without >>>> it). This means that http_proxy='http://localhost:8080/'. >>>> - We have local CRNA/BioConductor mirrors that can be accessed >>>> without going through the proxy. >>>> - My .Rprofile sources a file 'setrepos.R' on the local network, >>>> that sets all relevant repos to our local mirrors. >>>> >>>> From the shell: >>>> - I can wget any URL (local or internet) from command line without >>>> a problem. >>>> - In particular I can wget the file 'setrepos.R' from command line. >>>> >>>> Symptoms: >>>> - with options(download.file.method='wget'), I can download any URL >>>> (local or internet) with download.file >>>> - I _cannot_ source any local or internet URL if http_proxy is set. >>>> It simply freezes. Using internet.info=0 gives the following messages: >>>> ############ >>>> Warning messages: >>>> 1: In file(file, "r", encoding = encoding) : >>>> using HTTP proxy 'http://localhost:8080/' >>>> 2: In file(file, "r", encoding = encoding) : >>>> connected to 'localhost' on port 8080. >>>> 3: In file(file, "r", encoding = encoding) : >>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 >>>> Host: *OUR_HOST* >>>> Pragma: no-cache >>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >>>> >>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK >>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004 >>>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive >>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: >>>> Keep-Alive >>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 >>>> 9: In file(file, "r", encoding = encoding) : >>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT >>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: >>>> text/plain >>>> 11: In file(file, "r", encoding = encoding) : >>>> <- ETag: "30b8018-63d-4a627b821c980" >>>> 12: In file(file, "r", encoding = encoding) : >>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 >>>> PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch mod_python/3.3.1 >>>> Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0 >>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes >>>> 14: In file(file, "r", encoding = encoding) : >>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT >>>> 15: In file(file, "r", encoding = encoding) : Code 200, >>>> content-type 'text/plain' >>>> ############ >>>> >>>> - Setting options(download.file.method='wget') before sourcing does >>>> not change the behaviour. >>>> - However, I can source any local URL if http_proxy='', without >>>> changing download.file.method. But then download.file does not work >>>> for internet URL any more since the proxy settings are wrong. I >>>> could set http_proxy='', then source, then restore the proxy >>>> settings and set options(download.file.method='wget'). But this is >>>> just a work around and I would like to understand what is going on. >>>> >>>> Session Info: >>>> >>>> R version 2.13.2 (2011-09-30) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 >>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] devtools_0.4 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] RCurl_1.6-10 tools_2.13.2 >>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Renaud Gaujoux >>>> Computational Biology - University of Cape Town >>>> South Africa >>>> >>>> >>>> >>>> >>>> ### >>>> >>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT >>>> policies and e-mai...{{dropped:5}} >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> >> >> >> ### >> >> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT >> policies and e-mail disclaimer published on our website at >> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable >> from +27 21 650 9111. This e-mail is intended only for the person(s) >> to whom it is addressed. If the e-mail has reached you in error, >> please notify the author. If you are not the intended recipient of >> the e-mail you may not use, disclose, copy, redirect or print the >> content. If this e-mail is not related to the business of UCT it is >> sent by the sender in the sender's individual capacity. >> >> ### >> >> > ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}} From ripley at stats.ox.ac.uk Wed Oct 5 14:49:49 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 5 Oct 2011 13:49:49 +0100 (BST) Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: <4E8C5195.1040602@cbio.uct.ac.za> References: <4E8C2DD8.9050202@cbio.uct.ac.za> <4E8C34FC.20804@cbio.uct.ac.za> <4E8C5195.1040602@cbio.uct.ac.za> Message-ID: On Wed, 5 Oct 2011, Renaud Gaujoux wrote: > > On 05/10/2011 13:45, Prof Brian Ripley wrote: >> On Wed, 5 Oct 2011, Renaud Gaujoux wrote: >> >>> From the help page ?file I -- had -- read the following: >>> >>> "For ?url? the description is a complete URL, including scheme >>> (such as ?http://?, ?ftp://? or ?file://?). Proxies can be >>> specified for HTTP and FTP ?url? connections: see ?download.file?." >> >> So you should have known that it was the same as url()! > > I agree. I just thought -- incorrectly -- that any attempt to download a file > from R would eventually call the same C code as download.file. Or maybe It does. But download.file(method="wget") does not call that C code .... > source() does not download and source, but reads the file on the fly? That is true too, but then downloading a file is always done in chunks. >> >>> From the internet.info messages it seems that the proxy is actually used, >>> but somehow differently than what download.file does (via wget). >> >> No, somewhat differently than *wget* does. As that help page says, the >> section on proxies only refers to the internal method. >> >>> Is source supposed to work through a proxy? >> >> Yes, and it has been tested to do so. But not tested on your proxy .... > > OK, I agree that my settings look special, but in the end it is supposed to > be a plain local proxy with no authentication. > The proxy is effectively used by the internal method and, from the messages > (below), the remote file is opened, http headers are returned, but nothing > else happens and I have to cancel the command (Ctrl-C). > > This is where I would like to have some input, so that I can work out the > issue. > I tried to go through the C code for internet with no great luck: seems that > in_R_HTTPRead and RxmlNanoHTTPRead would the place to look at. > > Any idea on what would cause these functions to hang (infinite loop, > communication problem, ...)? > I know, I am too curious. > > Thank you > > >> Sys.getenv('http_proxy') > [1] "http://localhost:8080/" >> Sys.getenv('no_proxy') > [1] "localhost,127.0.0.0/8,*.local" >> options(internet.info=0) >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt") > trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt' > Content type 'text/plain' length 1209 bytes > opened URL > ^C > There were 15 warnings (use warnings() to see them) >> warnings() > Warning messages: > 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > connected to 'localhost' on port 8080. > 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0 > Host: lib.stat.cmu.edu > User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) > > 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- HTTP/1.1 200 OK > 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004 > 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Connection: Keep-Alive > 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Proxy-Connection: Keep-Alive > 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Content-Length: 1209 > 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Age: 747 > 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Date: Wed, 05 Oct 2011 11:52:25 GMT > 10: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Content-Type: text/plain > 11: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- ETag: "5c700f3-4b9-399383c0" > 12: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Server: Apache > 13: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Accept-Ranges: bytes > 14: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT > 15: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : > Code 200, content-type 'text/plain' > >> >> >>> >>> -- >>> Renaud Gaujoux >>> Computational Biology - University of Cape Town >>> South Africa >>> >>> >>> On 05/10/2011 12:26, Prof Brian Ripley wrote: >>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote: >>>> >>>>> Hi, >>>>> >>>>> I am having troubles sourcing a file from our local network from R. >>>>> It looks like this file are not properly accessed by 'source', even they >>>>> can be downloaded with download.file. (See below my settings and some >>>>> tests I did). I ended up with a work around, but I would like to >>>>> understand what is going on. >>>>> >>>>> Doesn't source/readLines uses the same mechanism as download.file to >>>>> access URLs? >>>> >>>> No. They use url() connections. See ?file. >>>> >>>>> >>>>> Thank you. >>>>> >>>>> Renaud >>>>> >>>>> My setting: >>>>> - I am using R 2.13.2 on Ubuntu 11.04. >>>>> - I am accessing internet through a proxy (set up with cntlm, not sure >>>>> if this is the issue but I don't know how to check without it). This >>>>> means that http_proxy='http://localhost:8080/'. >>>>> - We have local CRNA/BioConductor mirrors that can be accessed without >>>>> going through the proxy. >>>>> - My .Rprofile sources a file 'setrepos.R' on the local network, that >>>>> sets all relevant repos to our local mirrors. >>>>> >>>>> From the shell: >>>>> - I can wget any URL (local or internet) from command line without a >>>>> problem. >>>>> - In particular I can wget the file 'setrepos.R' from command line. >>>>> >>>>> Symptoms: >>>>> - with options(download.file.method='wget'), I can download any URL >>>>> (local or internet) with download.file >>>>> - I _cannot_ source any local or internet URL if http_proxy is set. It >>>>> simply freezes. Using internet.info=0 gives the following messages: >>>>> ############ >>>>> Warning messages: >>>>> 1: In file(file, "r", encoding = encoding) : >>>>> using HTTP proxy 'http://localhost:8080/' >>>>> 2: In file(file, "r", encoding = encoding) : >>>>> connected to 'localhost' on port 8080. >>>>> 3: In file(file, "r", encoding = encoding) : >>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 >>>>> Host: *OUR_HOST* >>>>> Pragma: no-cache >>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >>>>> >>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK >>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004 >>>>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive >>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: >>>>> Keep-Alive >>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 >>>>> 9: In file(file, "r", encoding = encoding) : >>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT >>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: >>>>> text/plain >>>>> 11: In file(file, "r", encoding = encoding) : >>>>> <- ETag: "30b8018-63d-4a627b821c980" >>>>> 12: In file(file, "r", encoding = encoding) : >>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 >>>>> with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 >>>>> OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0 >>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes >>>>> 14: In file(file, "r", encoding = encoding) : >>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT >>>>> 15: In file(file, "r", encoding = encoding) : Code 200, content-type >>>>> 'text/plain' >>>>> ############ >>>>> >>>>> - Setting options(download.file.method='wget') before sourcing does not >>>>> change the behaviour. >>>>> - However, I can source any local URL if http_proxy='', without changing >>>>> download.file.method. But then download.file does not work for internet >>>>> URL any more since the proxy settings are wrong. I could set >>>>> http_proxy='', then source, then restore the proxy settings and set >>>>> options(download.file.method='wget'). But this is just a work around and >>>>> I would like to understand what is going on. >>>>> >>>>> Session Info: >>>>> >>>>> R version 2.13.2 (2011-09-30) >>>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 >>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] devtools_0.4 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] RCurl_1.6-10 tools_2.13.2 >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Renaud Gaujoux >>>>> Computational Biology - University of Cape Town >>>>> South Africa >>>>> >>>>> >>>>> >>>>> >>>>> ### >>>>> >>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies >>>>> and e-mai...{{dropped:5}} >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>> >>> >>> >>> ### >>> >>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and >>> e-mail disclaimer published on our website at >>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from >>> +27 21 650 9111. This e-mail is intended only for the person(s) to whom it >>> is addressed. If the e-mail has reached you in error, please notify the >>> author. If you are not the intended recipient of the e-mail you may not >>> use, disclose, copy, redirect or print the content. If this e-mail is not >>> related to the business of UCT it is sent by the sender in the sender's >>> individual capacity. >>> >>> ### >>> >>> >> > > > > ### > > UNIVERSITY OF CAPE TOWN > This e-mail is subject to the UCT ICT policies and e-mail disclaimer > published on our website at > http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 > 21 650 9111. This e-mail is intended only for the person(s) to whom it is > addressed. If the e-mail has reached you in error, please notify the author. > If you are not the intended recipient of the e-mail you may not use, > disclose, copy, redirect or print the content. If this e-mail is not related > to the business of UCT it is sent by the sender in the sender's individual > capacity. > > ### > > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From sarah.dryhurst08 at imperial.ac.uk Wed Oct 5 12:53:59 2011 From: sarah.dryhurst08 at imperial.ac.uk (Dryhurst, Sarah) Date: Wed, 5 Oct 2011 10:53:59 +0000 Subject: [R] anova.pgls not working for factors in univariate analyses Message-ID: <197AC8EFAD839D4FB4D644604FB950840416FE@icexch-m4.ic.ac.uk> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sridhar.vukkadapu at gmail.com Wed Oct 5 10:31:49 2011 From: sridhar.vukkadapu at gmail.com (sridhar) Date: Wed, 5 Oct 2011 02:31:49 -0600 Subject: [R] unable to install 'pasilla' package on R In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gloriaalbe1 at yahoo.it Wed Oct 5 10:42:20 2011 From: gloriaalbe1 at yahoo.it (lunarossa) Date: Wed, 5 Oct 2011 01:42:20 -0700 (PDT) Subject: [R] Does it exist a function for this? Message-ID: <1317804140079-3873827.post@n4.nabble.com> I have this kind of matrix, with thousands of cases. A 2 apple A 2 peach A 3 peach B 1 pear B 4 peach B 4 beef B 7 beef C 1 peach D 2 apple D 5 peach I have to distinguish, from the other rows, the rows with "peach" and this is not a problem. I also have to discriminate the rows with peach like the second one (associated with the same two cells "A" and "2" to "apple", see first row) from the row like the 3rd or the 8th ones, when the first two cells are "unique" ("A" and "3" or "C" and "1"). -- View this message in context: http://r.789695.n4.nabble.com/Does-it-exist-a-function-for-this-tp3873827p3873827.html Sent from the R help mailing list archive at Nabble.com. From gloriaalbe1 at yahoo.it Wed Oct 5 10:55:33 2011 From: gloriaalbe1 at yahoo.it (lunarossa) Date: Wed, 5 Oct 2011 01:55:33 -0700 (PDT) Subject: [R] Which function for this? Message-ID: <1317804933476-3873856.post@n4.nabble.com> I have a matrix like this 0.05 0.13 1.2 0 0 0 0 0 red 0 0 0 0 0 0 0 0 white 0 0.06 0 0 0 0 0 0 blue If only 1 number in the first 8 columns is more than 0, in a new variable I write 1, if they're all 0 or less, I write 0, so 0.05 0.13 1.2 0 0 0 0 0 red 1 0 0 0 0 0 0 0 0 white 0 0 0.06 0 0 0 0 0 0 blue 1 I want to understand if the value of the new variable (1 or 0) is correlated to the colour.. I can use chi-square and what else? Anova and qhat else? And to correlated the first 8 columns to the colour, can I use logit? -- View this message in context: http://r.789695.n4.nabble.com/Which-function-for-this-tp3873856p3873856.html Sent from the R help mailing list archive at Nabble.com. From silverstein_yellowcard at yahoo.com Wed Oct 5 11:07:09 2011 From: silverstein_yellowcard at yahoo.com (Leynnard Rey Matillano) Date: Wed, 5 Oct 2011 02:07:09 -0700 (PDT) Subject: [R] kriging shapefiles Message-ID: <1317805629.20625.YahooMailNeo@web110713.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From johnny.jp.22 at gmail.com Wed Oct 5 11:47:07 2011 From: johnny.jp.22 at gmail.com (Johnny Paulo) Date: Wed, 5 Oct 2011 10:47:07 +0100 Subject: [R] Weird behaviour of tab characters in a string in R (vs Python) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From fomcl at yahoo.com Wed Oct 5 12:56:45 2011 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Wed, 5 Oct 2011 03:56:45 -0700 (PDT) Subject: [R] optimize R code: replace for loop In-Reply-To: References: <1317808173267-3873945.post@n4.nabble.com> Message-ID: <1317812205.28259.YahooMailNeo@web110704.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From scott.raynaud at yahoo.com Wed Oct 5 13:44:20 2011 From: scott.raynaud at yahoo.com (Scott Raynaud) Date: Wed, 5 Oct 2011 04:44:20 -0700 (PDT) Subject: [R] SPlus to R In-Reply-To: References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From richard.iles at griffithuni.edu.au Wed Oct 5 14:25:51 2011 From: richard.iles at griffithuni.edu.au (Richard Iles) Date: Wed, 5 Oct 2011 22:25:51 +1000 Subject: [R] repeating categorical variable codes Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From xenon99 at hotmail.com Wed Oct 5 14:54:57 2011 From: xenon99 at hotmail.com (Darius H) Date: Wed, 5 Oct 2011 12:54:57 +0000 Subject: [R] rolling regression In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rshepard at appl-ecosys.com Wed Oct 5 14:59:49 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Wed, 5 Oct 2011 05:59:49 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Wed, 5 Oct 2011, Petr PIKAL wrote: > Hm. I seldom use such approach. In your original request you said you want > split your data to smaller data frames based on sites Petr, I need the additional information in the database, too. > From what we know it is difficult to say if there is some common feature > in site variable. If it is organised like > XY-N > you can simply make new variable from first two letters Unfortunately, the site designations are not so uniform. As I went through the process of re-doing the data I discovered this lack of consistency resulting in duplicate records because one site had been designated XX-n and XXn. Had to clean those up, too. > sites <- substr(chemdata$site,1,2) > > then you can split your data frame according to sites > > chem.spl <- split(chemdata, sites) > > and do anything with your splitted data frames organised in list First thing this morning I'm upgrading to 2.13.2 and hoping that this fixes an issue that just showed up yesterday afternoon: not being able to access function help pages. For example, I tried ?subset and ?split because I thought the latter is really what I want, yet R told me no help was found. Strange; it was there a week ago. Thanks, Rich From batholdy at googlemail.com Wed Oct 5 15:00:31 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Wed, 5 Oct 2011 15:00:31 +0200 Subject: [R] mean of 3D arrays Message-ID: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> Hi, I have multiple three dimensional arrays. Like this: x1 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) x2 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) x3 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) Now I would like to compute the mean for each corresponding cell. As a result I want to get one 3D array (10 x 10 x 10) in which at position x, y, z is the mean of the corresponding values of x1, x2 and x3 (at position x, y, z). How can I do this? From kevin.thorpe at utoronto.ca Wed Oct 5 15:01:49 2011 From: kevin.thorpe at utoronto.ca (Kevin E. Thorpe) Date: Wed, 05 Oct 2011 09:01:49 -0400 Subject: [R] SPlus to R In-Reply-To: <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> Message-ID: <4E8C553D.4060209@utoronto.ca> On 10/05/2011 07:44 AM, Scott Raynaud wrote: > Hope I did this right. I repeated what I'd done before: > > 1) Opened script 2) Selected run all (this produced my inital post > > Then as suggested I: > > 3) Typed ls() 4) Saw that the function was present and issued > sshc(100,10) > > Here's what I got: > >> ls() > [1] "c.searchd" "convex" "Epower" "nef" "nef2" > "power1.f" [7] "ss.rand" "sshc" "vertex" >> sshc(100,10) > Error in return(ne = ne, Ep = Ep1) : multi-argument returns are not > permitted > > So it looks like I need to change the return(ne = ne, Ep = Ep1) to > two separate lines, correct? As another poster suggested, use list(ne = ne, Ep = Ep1) instead of the the present return(). > On a brighter note, I did get a power curve as expected. One thing I > don't understand is the meaning of the arguments in sshc(100,10). > > Thanks agian for your help. > > > ________________________________ From: Barry > Rowlingson > > Cc: "r-help at r-project.org" Sent: Wednesday, > October 5, 2011 4:02 AM Subject: Re: [R] SPlus to R > > > te: >> I'm trying to convert an S-Plus program to R. Since I'm a SAS >> programmer I'm not facile is either S-Plus or R, so I need some >> help. All I did was convert the underscores in S-Plus to the >> assignment operator<-. Here are the first few lines of the S-Plus >> file: >> >> sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, tol=0.01, >> tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) { ### for method 1 if >> (method==1) { ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) >> return(ne=ne1) } >> >> >> My translation looks like this: >> >> sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, >> tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) { ### for >> method 1 if (method==1) { >> ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) return(ne=ne1) } >> >> The program runs without throwing errors, but I'm not getting any >> ourput in the console. This is where it should be, right? I think >> I have this set up correctly. I'm using method=3 which only >> requires nc and d to be specified. Any ideas why I'm not seeing >> output? > > Long shot: the code you posted looked like (and hard to tell without > indentation) just a bunch of function definitions. R won't actually > do anything unless you call those functions with some parameters. > > So, when you say you get no output when you 'run' the code, what > exactly do you mean by 'run' the code? What I would do is: > > 1. Put the code in a file called 'whatever.R'. 2. Start R, and do > source("whatever.R"). That defines the functions. do "ls()" and you > should see them. 3. Call one of the functions: sshc(100,10) > > I'd call that, in R terms, "calling the sshc function" rather than > running anything. > > Barry -- Kevin E. Thorpe Biostatistician/Trialist, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 From ligges at statistik.tu-dortmund.de Wed Oct 5 15:03:10 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Wed, 05 Oct 2011 15:03:10 +0200 Subject: [R] SPlus to R In-Reply-To: <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> Message-ID: <4E8C558E.8070707@statistik.tu-dortmund.de> On 05.10.2011 13:44, Scott Raynaud wrote: > Hope I did this right. I repeated what I'd done before: > > 1) Opened script > 2) Selected run all (this produced my inital post > > Then as suggested I: > > 3) Typed ls() > 4) Saw that the function was present and issued sshc(100,10) > > Here's what I got: > >> ls() > [1] "c.searchd" "convex" "Epower" "nef" "nef2" "power1.f" > [7] "ss.rand" "sshc" "vertex" >> sshc(100,10) > Error in return(ne = ne, Ep = Ep1) : > multi-argument returns are not permitted > > So it looks like I need to change the return(ne = ne, Ep = Ep1) to two separate lines, correct? No: return(list(ne = ne, Ep = Ep1)) Uwe > > On a brighter note, I did get a power curve as expected. One thing I don't understand is the meaning of the arguments in sshc(100,10). > > Thanks agian for your help. > > > ________________________________ > From: Barry Rowlingson > > Cc: "r-help at r-project.org" > Sent: Wednesday, October 5, 2011 4:02 AM > Subject: Re: [R] SPlus to R > > > te: >> I'm trying to convert an S-Plus program to R. Since I'm a SAS programmer I'm not facile is either S-Plus or R, so I need some help. All I did was convert the underscores in S-Plus to the assignment operator<-. Here are the first few lines of the S-Plus file: >> >> sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, >> tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) >> { >> ### for method 1 >> if (method==1) { >> ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) >> return(ne=ne1) >> } >> >> >> My translation looks like this: >> >> sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, >> tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) >> { >> ### for method 1 >> if (method==1) { >> ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) >> return(ne=ne1) >> } >> >> The program runs without throwing errors, but I'm not getting any ourput in the console. This is where it should be, right? I think I have this set up correctly. I'm using method=3 which only requires nc and d to be specified. Any ideas why I'm not seeing output? > > Long shot: the code you posted looked like (and hard to tell without > indentation) just a bunch of function definitions. R won't actually do > anything unless you call those functions with some parameters. > > So, when you say you get no output when you 'run' the code, what > exactly do you mean by 'run' the code? What I would do is: > > 1. Put the code in a file called 'whatever.R'. > 2. Start R, and do source("whatever.R"). That defines the functions. > do "ls()" and you should see them. > 3. Call one of the functions: sshc(100,10) > > I'd call that, in R terms, "calling the sshc function" rather than > running anything. > > Barry > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From r.m.krug at gmail.com Wed Oct 5 15:04:05 2011 From: r.m.krug at gmail.com (Rainer M Krug) Date: Wed, 5 Oct 2011 15:04:05 +0200 Subject: [R] "unload" a library while testing? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From izahn at psych.rochester.edu Wed Oct 5 15:03:57 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Wed, 5 Oct 2011 09:03:57 -0400 Subject: [R] texi2dvi problem when compiling incorrect Latex code In-Reply-To: <1317806850526-3873909.post@n4.nabble.com> References: <1317732235196-3870827.post@n4.nabble.com> <1317806850526-3873909.post@n4.nabble.com> Message-ID: I think we need more details about your setup. It works as I described (prints errors then returns to R prompt) in all configurations I've tried, including th Linux terminal, Emacs ESS, Eclispse statET console setup, I couldn't get RJ workign). What operating system are you using? R version? Eclipse/statET version? Does it work when you run R directly from a terminal (i.e., not through eclipse)? Best, Ista On Wed, Oct 5, 2011 at 5:27 AM, syrvn wrote: > Hi Ista, > > thanks for you reply. > > If I understod correctly you run your R within Eclipse but as the Lunch Type > you use Rterm rather than RJ. > > I changed my configuration so that R is now lunched as Rterm and NOT as RJ > and I also removed the > quite=FALSE from my configuration. Unfortunately, I still have the same > problem. > To create an error in my latex code I just typed the following: \asdasd in > one of my .tex files. > When I compile my document by hand using the Mac OS X / UNIX terminal I get > the following latex compiling output: > > Underfull \hbox (badness 10000) in paragraph at lines 4--7 > > ) (../abstract.tex > Underfull \hbox (badness 10000) in paragraph at lines 1--12 > > ../abstract.tex:15: Undefined control sequence. > l.15 ?\asdasdas > > ? > > It stops at the question mark and waits for user input. If I press enter it > continues and finally stops with an error which is fine. > The only problem is that if I do it in R the console does not print > everything until the question mark and therefore I cannot just > press enter to let latex finish compiling the code. > > I don't know how to get around this. > > > Best syrvn > > -- > View this message in context: http://r.789695.n4.nabble.com/texi2dvi-problem-when-compiling-incorrect-Latex-code-tp3870827p3873909.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From mtmorgan at fhcrc.org Wed Oct 5 15:05:56 2011 From: mtmorgan at fhcrc.org (Martin Morgan) Date: Wed, 05 Oct 2011 06:05:56 -0700 Subject: [R] unable to install 'pasilla' package on R In-Reply-To: References: Message-ID: <4E8C5634.50303@fhcrc.org> On 10/05/2011 01:31 AM, sridhar wrote: > I am trying to install or load pasilla package on R. i am getting the > following error. Please let me know how to install pasilla on R. > > biocLite("pasilla") > Using R version 2.13.2, biocinstall version 2.8.4. > Installing Bioconductor version 2.8 packages: > [1] "pasilla" > Please wait... > > Installing package(s) into ?C:/Users/Sridhar/Documents/R/win-library/2.13? > (as ?lib? is unspecified) > Warning message: > In getDependencies(pkgs, dependencies, available, lib) : > package ?pasilla? is not available (for R version 2.13.2) Hi Sridhar -- pasilla is a Bioconductor package http://bioconductor.org/ http://bioconductor.org/packages/devel/data/experiment/ so please ask there. http://bioconductor.org/help/mailing-list/ It is only available in R-2.14.0 alpha an later, so update your R. Best, Martin > > > Best Regards, > Sridhar Vukkadapu. > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 From renaud at mancala.cbio.uct.ac.za Wed Oct 5 15:07:24 2011 From: renaud at mancala.cbio.uct.ac.za (Renaud Gaujoux) Date: Wed, 05 Oct 2011 15:07:24 +0200 Subject: [R] Behaviour of 'source' with URLs and proxy In-Reply-To: References: <4E8C2DD8.9050202@cbio.uct.ac.za> <4E8C34FC.20804@cbio.uct.ac.za> <4E8C5195.1040602@cbio.uct.ac.za> Message-ID: <4E8C568C.2040805@cbio.uct.ac.za> So source() always reads a URL using the internal method, because it reads them chunk by chunk, and I suppose the other methods of download.file (wget, etc...) do not support (?). I guess the only way of finding out where the reading process gets stuck is to get into the C code and add more tracking messages. Will try this. Thank you. On 05/10/2011 14:49, Prof Brian Ripley wrote: > On Wed, 5 Oct 2011, Renaud Gaujoux wrote: > >> >> On 05/10/2011 13:45, Prof Brian Ripley wrote: >>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote: >>> >>>> From the help page ?file I -- had -- read the following: >>>> >>>> "For ?url? the description is a complete URL, including scheme >>>> (such as ?http://?, ?ftp://? or ?file://?). Proxies can be >>>> specified for HTTP and FTP ?url? connections: see ?download.file?." >>> >>> So you should have known that it was the same as url()! >> >> I agree. I just thought -- incorrectly -- that any attempt to >> download a file from R would eventually call the same C code as >> download.file. Or maybe > > It does. But download.file(method="wget") does not call that C code .... > >> source() does not download and source, but reads the file on the fly? > > That is true too, but then downloading a file is always done in chunks. > >>> >>>> From the internet.info messages it seems that the proxy is actually >>>> used, but somehow differently than what download.file does (via wget). >>> >>> No, somewhat differently than *wget* does. As that help page says, >>> the section on proxies only refers to the internal method. >>> >>>> Is source supposed to work through a proxy? >>> >>> Yes, and it has been tested to do so. But not tested on your proxy >>> .... >> >> OK, I agree that my settings look special, but in the end it is >> supposed to be a plain local proxy with no authentication. >> The proxy is effectively used by the internal method and, from the >> messages (below), the remote file is opened, http headers are >> returned, but nothing else happens and I have to cancel the command >> (Ctrl-C). >> >> This is where I would like to have some input, so that I can work out >> the issue. >> I tried to go through the C code for internet with no great luck: >> seems that in_R_HTTPRead and RxmlNanoHTTPRead would the place to look >> at. >> >> Any idea on what would cause these functions to hang (infinite loop, >> communication problem, ...)? >> I know, I am too curious. >> >> Thank you >> >> >>> Sys.getenv('http_proxy') >> [1] "http://localhost:8080/" >>> Sys.getenv('no_proxy') >> [1] "localhost,127.0.0.0/8,*.local" >>> options(internet.info=0) >>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt") >> trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt' >> Content type 'text/plain' length 1209 bytes >> opened URL >> ^C >> There were 15 warnings (use warnings() to see them) >>> warnings() >> Warning messages: >> 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> connected to 'localhost' on port 8080. >> 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0 >> Host: lib.stat.cmu.edu >> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >> >> 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- HTTP/1.1 200 OK >> 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004 >> 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- Connection: Keep-Alive >> 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- Proxy-Connection: Keep-Alive >> 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- Content-Length: 1209 >> 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- Age: 747 >> 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", >> ... : >> <- Date: Wed, 05 Oct 2011 11:52:25 GMT >> 10: In >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : >> <- Content-Type: text/plain >> 11: In >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : >> <- ETag: "5c700f3-4b9-399383c0" >> 12: In >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : >> <- Server: Apache >> 13: In >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : >> <- Accept-Ranges: bytes >> 14: In >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : >> <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT >> 15: In >> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... : >> Code 200, content-type 'text/plain' >> >>> >>> >>>> >>>> -- >>>> Renaud Gaujoux >>>> Computational Biology - University of Cape Town >>>> South Africa >>>> >>>> >>>> On 05/10/2011 12:26, Prof Brian Ripley wrote: >>>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am having troubles sourcing a file from our local network from R. >>>>>> It looks like this file are not properly accessed by 'source', >>>>>> even they can be downloaded with download.file. (See below my >>>>>> settings and some tests I did). I ended up with a work around, >>>>>> but I would like to understand what is going on. >>>>>> >>>>>> Doesn't source/readLines uses the same mechanism as download.file >>>>>> to access URLs? >>>>> >>>>> No. They use url() connections. See ?file. >>>>> >>>>>> >>>>>> Thank you. >>>>>> >>>>>> Renaud >>>>>> >>>>>> My setting: >>>>>> - I am using R 2.13.2 on Ubuntu 11.04. >>>>>> - I am accessing internet through a proxy (set up with cntlm, not >>>>>> sure if this is the issue but I don't know how to check without >>>>>> it). This means that http_proxy='http://localhost:8080/'. >>>>>> - We have local CRNA/BioConductor mirrors that can be accessed >>>>>> without going through the proxy. >>>>>> - My .Rprofile sources a file 'setrepos.R' on the local network, >>>>>> that sets all relevant repos to our local mirrors. >>>>>> >>>>>> From the shell: >>>>>> - I can wget any URL (local or internet) from command line >>>>>> without a problem. >>>>>> - In particular I can wget the file 'setrepos.R' from command line. >>>>>> >>>>>> Symptoms: >>>>>> - with options(download.file.method='wget'), I can download any >>>>>> URL (local or internet) with download.file >>>>>> - I _cannot_ source any local or internet URL if http_proxy is >>>>>> set. It simply freezes. Using internet.info=0 gives the following >>>>>> messages: >>>>>> ############ >>>>>> Warning messages: >>>>>> 1: In file(file, "r", encoding = encoding) : >>>>>> using HTTP proxy 'http://localhost:8080/' >>>>>> 2: In file(file, "r", encoding = encoding) : >>>>>> connected to 'localhost' on port 8080. >>>>>> 3: In file(file, "r", encoding = encoding) : >>>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0 >>>>>> Host: *OUR_HOST* >>>>>> Pragma: no-cache >>>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu) >>>>>> >>>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK >>>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 >>>>>> SRVWINTMG004 >>>>>> 6: In file(file, "r", encoding = encoding) : <- Connection: >>>>>> Keep-Alive >>>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: >>>>>> Keep-Alive >>>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597 >>>>>> 9: In file(file, "r", encoding = encoding) : >>>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT >>>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: >>>>>> text/plain >>>>>> 11: In file(file, "r", encoding = encoding) : >>>>>> <- ETag: "30b8018-63d-4a627b821c980" >>>>>> 12: In file(file, "r", encoding = encoding) : >>>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 >>>>>> PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch mod_python/3.3.1 >>>>>> Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4 >>>>>> Perl/v5.10.0 >>>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: >>>>>> bytes >>>>>> 14: In file(file, "r", encoding = encoding) : >>>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT >>>>>> 15: In file(file, "r", encoding = encoding) : Code 200, >>>>>> content-type 'text/plain' >>>>>> ############ >>>>>> >>>>>> - Setting options(download.file.method='wget') before sourcing >>>>>> does not change the behaviour. >>>>>> - However, I can source any local URL if http_proxy='', without >>>>>> changing download.file.method. But then download.file does not >>>>>> work for internet URL any more since the proxy settings are >>>>>> wrong. I could set http_proxy='', then source, then restore the >>>>>> proxy settings and set options(download.file.method='wget'). But >>>>>> this is just a work around and I would like to understand what is >>>>>> going on. >>>>>> >>>>>> Session Info: >>>>>> >>>>>> R version 2.13.2 (2011-09-30) >>>>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>>>> >>>>>> locale: >>>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C >>>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8 >>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C >>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] devtools_0.4 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] RCurl_1.6-10 tools_2.13.2 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Renaud Gaujoux >>>>>> Computational Biology - University of Cape Town >>>>>> South Africa >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ### >>>>>> >>>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT >>>>>> policies and e-mai...{{dropped:5}} >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>> >>>> >>>> >>>> >>>> ### >>>> >>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT >>>> policies and e-mail disclaimer published on our website at >>>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable >>>> from +27 21 650 9111. This e-mail is intended only for the >>>> person(s) to whom it is addressed. If the e-mail has reached you in >>>> error, please notify the author. If you are not the intended >>>> recipient of the e-mail you may not use, disclose, copy, redirect >>>> or print the content. If this e-mail is not related to the business >>>> of UCT it is sent by the sender in the sender's individual capacity. >>>> >>>> ### >>>> >>>> >>> >> >> >> >> ### >> >> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT >> policies and e-mail disclaimer published on our website at >> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable >> from +27 21 650 9111. This e-mail is intended only for the person(s) >> to whom it is addressed. If the e-mail has reached you in error, >> please notify the author. If you are not the intended recipient of >> the e-mail you may not use, disclose, copy, redirect or print the >> content. If this e-mail is not related to the business of UCT it is >> sent by the sender in the sender's individual capacity. >> >> ### >> >> > ### UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}} From E.Vettorazzi at uke.de Wed Oct 5 15:11:30 2011 From: E.Vettorazzi at uke.de (Eik Vettorazzi) Date: Wed, 5 Oct 2011 15:11:30 +0200 Subject: [R] help with regexp In-Reply-To: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> References: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> Message-ID: <4E8C5782.70801@uke.de> Hi Jannis, just use the backreferences in gsub, see ?gsub, -> replacement test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') gsub(".*_([A-z]+)\\.pdf", "\\1", test) hth. Am 05.10.2011 13:56, schrieb Jannis: > Dear list memebers, > > > I am stuck with using regular expressions. > > > Imagine I have a vector of character strings like: > > test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') > > How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? > > > Some try from my side yielded no results: > > testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) > > Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: > > testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) > > > but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? > > > > Thanks for any help > > Jannis > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut f?r Medizinische Biometrie und Epidemiologie Universit?tsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus From michael.weylandt at gmail.com Wed Oct 5 15:14:47 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Wed, 5 Oct 2011 09:14:47 -0400 Subject: [R] mean of 3D arrays In-Reply-To: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> References: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> Message-ID: <2A0D0E4F-B6F0-4476-9CFC-7071D2E0AEAB@gmail.com> (x1+x2+x3)/3 I'm not aware of a "pmean" function but it wouldn't be hard to homebrew one if you are comfortable with the ... argument I'll draft one up and send it along Michael Weylandt On Oct 5, 2011, at 9:00 AM, Martin Batholdy wrote: > Hi, > > I have multiple three dimensional arrays. > > Like this: > > x1 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) > x2 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) > x3 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) > > > Now I would like to compute the mean for each corresponding cell. > As a result I want to get one 3D array (10 x 10 x 10) in which at position x, y, z is the mean of the corresponding values of x1, x2 and x3 (at position x, y, z). > > > How can I do this? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From E.Vettorazzi at uke.de Wed Oct 5 15:15:59 2011 From: E.Vettorazzi at uke.de (Eik Vettorazzi) Date: Wed, 5 Oct 2011 15:15:59 +0200 Subject: [R] "unload" a library while testing? In-Reply-To: References: Message-ID: <4E8C588F.60501@uke.de> Hi Rainer, for better or worse "unlibrary" actually is done by detach in R, ?detach #first example cheers Am 05.10.2011 15:04, schrieb Rainer M Krug: > Hi > > I am testing a package, and after I make changes, I have to close R and open > R again to load the new version (same version number) of the package I am > working on. So my question: > > is there a function which removes a package, i.e > > library(myPackage) > >> Package is loaded > unlibrary(myPackage) > >> package is not loaded any more > > Thanks, > > Rainer > -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus From izahn at psych.rochester.edu Wed Oct 5 15:18:18 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Wed, 5 Oct 2011 09:18:18 -0400 Subject: [R] "unload" a library while testing? In-Reply-To: References: Message-ID: Hi Rainer, On Wed, Oct 5, 2011 at 9:04 AM, Rainer M Krug wrote: > Hi > > I am testing a package, and after I make changes, I have to close R and open > R again to load the new version (same version number) of the package I am > working on. So my question: > > is there a function which removes a package, i.e > > library(myPackage) > >> Package is loaded > unlibrary(myPackage) The function is 'detach(package:)', possibly with the unload option. Best, Ista > >> package is not loaded any more > > Thanks, > > Rainer > > -- > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, > UCT), Dipl. Phys. (Germany) > > Centre of Excellence for Invasion Biology > Stellenbosch University > South Africa > > Tel : ? ? ? +33 - (0)9 53 10 27 44 > Cell: ? ? ? +33 - (0)6 85 62 59 98 > Fax (F): ? ? ? +33 - (0)9 58 10 27 44 > > Fax (D): ? ?+49 - (0)3 21 21 25 22 44 > > email: ? ? ?Rainer at krugs.de > > Skype: ? ? ?RMkrug > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From r.m.krug at gmail.com Wed Oct 5 15:19:35 2011 From: r.m.krug at gmail.com (Rainer M Krug) Date: Wed, 5 Oct 2011 15:19:35 +0200 Subject: [R] "unload" a library while testing? In-Reply-To: <4E8C588F.60501@uke.de> References: <4E8C588F.60501@uke.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rshepard at appl-ecosys.com Wed Oct 5 15:21:56 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Wed, 5 Oct 2011 06:21:56 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Wed, 5 Oct 2011, Rich Shepard wrote: > First thing this morning I'm upgrading to 2.13.2 and hoping that this > fixes an issue that just showed up yesterday afternoon: not being able to > access function help pages. For example, I tried ?subset and ?split because > I thought the latter is really what I want, yet R told me no help was found. > Strange; it was there a week ago. Yep. The upgrade brought back the help system. Rich From heverkuhn at gmail.com Wed Oct 5 15:28:46 2011 From: heverkuhn at gmail.com (Heverkuhn Heverkuhn) Date: Wed, 5 Oct 2011 08:28:46 -0500 Subject: [R] break.axis all range of data In-Reply-To: References: <4E8C2DD4.3070407@bitwrit.com.au> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From heverkuhn at gmail.com Wed Oct 5 15:31:51 2011 From: heverkuhn at gmail.com (Heverkuhn Heverkuhn) Date: Wed, 5 Oct 2011 08:31:51 -0500 Subject: [R] break.axis all range of data In-Reply-To: References: <4E8C2DD4.3070407@bitwrit.com.au> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Wed Oct 5 15:39:04 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Wed, 5 Oct 2011 09:39:04 -0400 Subject: [R] mean of 3D arrays In-Reply-To: <2A0D0E4F-B6F0-4476-9CFC-7071D2E0AEAB@gmail.com> References: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> <2A0D0E4F-B6F0-4476-9CFC-7071D2E0AEAB@gmail.com> Message-ID: As promised ### Untested pmean <- function(...){ dotArgs <- list(...) l <- length(dotArgs) if( l == 0L ) stop("no arguments") temp <- dotArgs[[1]] if ( l > 1L ) {for(i in 2L:l) {temp <- temp + dotArgs[[i]]}} temp/l } Clunky but gets the job done. Its still too early for me to think straight so I'll let someone else kill the loop and add error checking if interested. Michael Weylandt On Oct 5, 2011, at 9:14 AM, "R. Michael Weylandt " wrote: > (x1+x2+x3)/3 > > I'm not aware of a "pmean" function but it wouldn't be hard to homebrew one if you are comfortable with the ... argument > > I'll draft one up and send it along > > Michael Weylandt > > On Oct 5, 2011, at 9:00 AM, Martin Batholdy wrote: > >> Hi, >> >> I have multiple three dimensional arrays. >> >> Like this: >> >> x1 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) >> x2 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) >> x3 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) >> >> >> Now I would like to compute the mean for each corresponding cell. >> As a result I want to get one 3D array (10 x 10 x 10) in which at position x, y, z is the mean of the corresponding values of x1, x2 and x3 (at position x, y, z). >> >> >> How can I do this? >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. From petr.pikal at precheza.cz Wed Oct 5 15:44:09 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Wed, 5 Oct 2011 15:44:09 +0200 Subject: [R] Odp: repeating categorical variable codes In-Reply-To: References: Message-ID: Hi > > I would appreciate help in knowing how to repeat categorical variable code > given in column=A, by the number in a matching column=B. > For example, I have a categorical variable code attributed to a household=A > and want to replicate the code for all member of the household, as given in > column=B. I would like to have one sequence of categorical variable codes > for individuals in column C. I have ~9000 values in A and my C will be > ~52000. > > E.g > (A) (B) (C) > 1 1 1 > 2 2 2 > 1 1 2 > 2 3 1 > 2 > 2 > 2 > > Any ideas would be gratefully accepted by a novice R user. I am not sure if I understand your problem but does rep(A, B) give you what you want? rep(letters[1:3], 1:3) [1] "a" "b" "b" "c" "c" "c" Regards Petr > > Ric > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From petr.pikal at precheza.cz Wed Oct 5 15:54:52 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Wed, 5 Oct 2011 15:54:52 +0200 Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: Hi > > On Wed, 5 Oct 2011, Petr PIKAL wrote: > > > Hm. I seldom use such approach. In your original request you said you want > > split your data to smaller data frames based on sites > > Petr, > > I need the additional information in the database, too. But you do not loose them, your data frame is cut according to sites variable and put into a list see > iris.spl<- split(iris, iris$Species) > str(iris.spl) List of 3 $ setosa :'data.frame': 50 obs. of 5 variables: ..$ Sepal.Length: num [1:50] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ..$ Sepal.Width : num [1:50] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... > > > From what we know it is difficult to say if there is some common feature > > in site variable. If it is organised like > > XY-N > > you can simply make new variable from first two letters > > Unfortunately, the site designations are not so uniform. As I went through > the process of re-doing the data I discovered this lack of consistency > resulting in duplicate records because one site had been designated XX-n and > XXn. Had to clean those up, too. > > > sites <- substr(chemdata$site,1,2) Which would not matter if the first two letters designates required grouping variable I called sites Regards Petr > > > > then you can split your data frame according to sites > > > > chem.spl <- split(chemdata, sites) > > > > and do anything with your splitted data frames organised in list > > First thing this morning I'm upgrading to 2.13.2 and hoping that this > fixes an issue that just showed up yesterday afternoon: not being able to > access function help pages. For example, I tried ?subset and ?split because > I thought the latter is really what I want, yet R told me no help was found. > Strange; it was there a week ago. > > Thanks, > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From mentor_ at gmx.net Wed Oct 5 16:02:33 2011 From: mentor_ at gmx.net (syrvn) Date: Wed, 5 Oct 2011 07:02:33 -0700 (PDT) Subject: [R] texi2dvi problem when compiling incorrect Latex code In-Reply-To: References: <1317732235196-3870827.post@n4.nabble.com> <1317806850526-3873909.post@n4.nabble.com> Message-ID: <1317823353799-3874594.post@n4.nabble.com> Hi Ista, it's weird I don't know why this happens. I tried so many different ways but now finally I found a solution. I wrote a little shell script: pdflatex -halt-on-error body.tex bibtex body.aux pdflatex -halt-on-error body.tex pdflatex -halt-on-error body.tex which does the job. So my "External Tools Configuration" looks like the following now: library(tools) setwd("${container_loc}") system("./body.sh", intern=TRUE) I get the output into R and as soon as there is an error in my latex code it the compilation/processing stops and the output is still put into the R console. This was possible by adding the -halt-on-error parameter. Thanks for your support anyway. Best, Syrvn -- View this message in context: http://r.789695.n4.nabble.com/texi2dvi-problem-when-compiling-incorrect-Latex-code-tp3870827p3874594.html Sent from the R help mailing list archive at Nabble.com. From petr.pikal at precheza.cz Wed Oct 5 16:05:55 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Wed, 5 Oct 2011 16:05:55 +0200 Subject: [R] repeating categorical variable codes In-Reply-To: References: Message-ID: > > Hi Petr, > > Thank you for the reply. Unfortunately my repetition is not uniform, but > dependent on values given in column B, which varies by each row. > > Does this make it any clearer? Not much. let say > A <- letters[1:10] > B <- sample(1:3, 10, replace =TRUE) > A [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" > B [1] 2 3 2 3 1 1 1 3 1 1 Than > rep(A,B) [1] "a" "a" "b" "b" "b" "c" "c" "d" "d" "d" "e" "f" "g" "h" "h" "h" "i" "j" gives you repeating letters according to values in second vector. If this is not what you want, try to send some artificial example which illustrates what you really have and what is desired result. Regards Petr > > Ric > On 5 October 2011 23:44, Petr PIKAL wrote: > Hi > > > > > I would appreciate help in knowing how to repeat categorical variable > code > > given in column=A, by the number in a matching column=B. > > For example, I have a categorical variable code attributed to a > household=A > > and want to replicate the code for all member of the household, as given > in > > column=B. I would like to have one sequence of categorical variable > codes > > for individuals in column C. I have ~9000 values in A and my C will be > > ~52000. > > > > E.g > > (A) (B) (C) > > 1 1 1 > > 2 2 2 > > 1 1 2 > > 2 3 1 > > 2 > > 2 > > 2 > > > > Any ideas would be gratefully accepted by a novice R user. > I am not sure if I understand your problem but does rep(A, B) > > give you what you want? > > rep(letters[1:3], 1:3) > [1] "a" "b" "b" "c" "c" "c" > > Regards > Petr > > > > > > Ric > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > From b.rowlingson at lancaster.ac.uk Wed Oct 5 16:27:34 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Wed, 5 Oct 2011 15:27:34 +0100 Subject: [R] SPlus to R In-Reply-To: <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> Message-ID: On Wed, Oct 5, 2011 at 12:44 PM, Scott Raynaud wrote: > Hope I did this right.? I repeated what I'd done before: > > 1) Opened script > 2) Selected run all (this produced my inital post > > Then as suggested I: > > 3) Typed ls() > 4) Saw that the function was present and issued sshc(100,10) > > Here's what I got: > >> ls() > [1] "c.searchd" "convex"??? "Epower"??? "nef"?????? "nef2"????? "power1.f" > [7] "ss.rand"?? "sshc"????? "vertex" >> sshc(100,10) > Error in return(ne = ne, Ep = Ep1) : > ? multi-argument returns are not permitted > So it looks like I need to change the return(ne = ne, Ep = Ep1) to two > separate lines, correct? > > On a brighter note, I did get a power curve as expected.? One thing I don't > understand is the meaning of the arguments in sshc(100,10). There are some comments in the function code that tell you: # rc number of response in historical control group # nc sample size in historical control # d target improvement = Pe - Pc # method 1=method based on the randomized design # 2=Makuch & Simon method (Makuch RW, Simon RM. Sample size considerations # for non-randomized comparative studies. J of Chron Dis 1980; 3:175-181. # 3=uniform power method ######## optional Input: - and so on. Beyond that, I'll have to defer to people who know what this is actually trying to compute... Also, its highly possible that this code has already been ported to R - lots of things have. If you know what its meant to compute then a quick search might get you running quicker. Barry From marcus.nunes at gmail.com Wed Oct 5 16:51:19 2011 From: marcus.nunes at gmail.com (Marcus Nunes) Date: Wed, 5 Oct 2011 10:51:19 -0400 Subject: [R] F-values in nested designs In-Reply-To: References: Message-ID: Dennis, thanks for your help. I've read your email and the references you gave and things are more clear to me. Best, Marcus On Tue, Oct 4, 2011 at 19:28, Dennis Murphy wrote: > > Hi: > > > INB4: if I have a nested design with treatment A and treatment B > > within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I > > make R give these values directly, without further coding? > > This is how to get an equivalent model in lme4, but it probably isn't > what you expect (particularly the 'without further coding' part). > Using your example, > > library('lme4') > dn <- data.frame( y = c(10, 12, 8, 13, 14, 8, 10, 12, > ? ? ? ? ? ? ? ? ? ? ? ?9, 10, 12, 11, 11, 13, 9, 10, 14, > ? ? ? ? ? ? ? ? ? ? ? ?11, 10, 9, 8, 9, 8, 8, 13, 14, 7, > ? ? ? ? ? ? ? ? ? ? ? ?10, 10, 13, 9, 7, 16, 12, 5, 4), > ? ? ? ? ? ? ? ? areas = factor(rep(c("m1", "m2", "m3"), each=12)), > ? ? ? ? ? ? ? ? sites = factor(rep(1:4, 9))) > > a <- lmer(y ~ areas + (1 | areas:sites), data = dn) > > a > Linear mixed model fit by REML > Formula: y ~ areas + (1 | areas:sites) > ? Data: dn > ? AIC BIC logLik deviance REMLdev > ?171.1 179 -80.56 ? ? ?167 ? 161.1 > Random effects: > ?Groups ? ? ?Name ? ? ? ?Variance Std.Dev. > ?areas:sites (Intercept) 3.25 ? ? 1.8028 > ?Residual ? ? ? ? ? ? ? ?4.50 ? ? 2.1213 > Number of obs: 36, groups: areas:sites, 12 > > Fixed effects: > ? ? ? ? ? ?Estimate Std. Error t value > (Intercept) ? 10.750 ? ? ?1.090 ? 9.865 > areasm2 ? ? ? -0.750 ? ? ?1.541 ?-0.487 > areasm3 ? ? ? -0.750 ? ? ?1.541 ?-0.487 > > Correlation of Fixed Effects: > ? ? ? ?(Intr) aresm2 > areasm2 -0.707 > areasm3 -0.707 ?0.500 > ##------ > > lme4 reports the estimated variance components and their square roots, > the standard deviation components (Std.Dev). The estimated residual > variance component is 4.5, which is the same as the residual MSE from > the Minitab output. The estimated variance component associated with > sites nested within areas (areas:sites) is 3.25. Since the design is > balanced, the expected mean square of this term (assuming the model > assumptions are correct) is $\sigma_e^2 + 3 \sigma_s^2$, which is > estimated by 4.5 + 3(3.25) = 14.25, the observed mean square for sites > within areas, again coinciding with the Minitab output. However, > lmer() does not report the result of an F-test for the 'significance' > of the sites variance component, because the null hypothesis > $\sigma_s^2 = 0$ is on the boundary of the parameter space and there > are questions about the reliability of p-values for such tests. See > http://rwiki.sciviews.org/doku.php?id=guides:lmer-tests ? In other > words, don't accept the reported p-value re the sites variance from > Minitab on faith. This answers (even in multi-stratum models in aov() > ) > > > 2) why I don't have an F-value for the nested effect? I realize that R > > call it as Residuals in the first part of the summary, but there is a > > way to make R consider it s another factor? > > To get the fixed effects part of Minitab's ANOVA table with lmer(), > > anova(a) > Analysis of Variance Table > ? ? ?Df Sum Sq Mean Sq F value > areas ?2 1.4208 0.71039 ?0.1579 > > Once again, the p-value is not reported (by design). Assuming that the > specified normal-theory based model is correct, the conventional F > test for testing the null hypothesis of equal area means would be the > mean square ratio of areas to sites, which would have an F(2, 9) > distribution under the null hypothesis. The p-value of that test would > be > > > 1 - pf(0.1579, 2, 9) > [1] 0.85625 > > Apart from the needless test of the sites within areas variance > component, the lmer() output corresponds to that of the Minitab table. > The output from lmer() gives you the capacity to do much more, but it > helps to understand some of the theory behind mixed models first. > > The transition from fixed effects ANOVA to random effects and mixed > models is not a smooth one - multiple sources of random variation > complicate both testing and confidence/prediction interval procedures, > with several messages on R-sig-mixed-models (including the one cited > above) discussing such issues at length. > > As I said, this is probably not what you expected. > > Dennis > > > > > On Tue, Oct 4, 2011 at 11:17 AM, Marcus Nunes wrote: > > Hello all > > > > I'm trying to learn how to fit a nested model in R. I found a toy > > example on internet where a dataset that have?3 areas and 4 sites > > within these areas. When I use Minitab to fit a nested model to this > > data, this is the ANOVA table that I got: > > > > Nested ANOVA: y versus areas, sites > > > > Analysis of Variance for y > > Source ?DF ? ? ? ?SS ? ? ? MS ? ? ?F ? ? ?P > > areas ? ?2 ? ?4.5000 ? 2.2500 ?0.158 ?0.856 > > sites ? ?9 ?128.2500 ?14.2500 ?3.167 ?0.012 > > Error ? 24 ?108.0000 ? 4.5000 > > Total ? 35 ?240.7500 > > > > When I use R, this is the ANOVA table that I got: > > > > summary(aov(y ~ areas + Error(areas%in%sites))) > > > > Error: areas:sites > > ? ? ? ? ?Df Sum Sq Mean Sq F value Pr(>F) > > areas ? ? ?2 ? 4.50 ? ?2.25 ?0.1579 0.8563 > > Residuals ?9 128.25 14.25 > > > > Error: Within > > ? ? ? ? ?Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 24 ? ?108 ? ? 4.5 > > Warning message: > > In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular > > > > The results are the same, except for one F-value and I don't > > understand why. Hence, these are my questions: > > > > 1) I searched google and I can't find a reason to have this warning in > > my code. Why is this happening? > > > > 2) why I don't have an F-value for the nested effect? I realize that R > > call it as Residuals in the first part of the summary, but there is a > > way to make R consider it s another factor? > > > > INB4: if I have a nested design with treatment A and treatment B > > within A, F-values are MSA/MSA(B) and MSA(B)/MSE, correct? How can I > > make R give these values directly, without further coding? > > > > Thanks for your help. > > > > Below is my code and information about my system. > > ---------------------- > > y = c(10, 12, 8, 13, 14, 8, 10, 12, 9, 10, 12, 11, 11, 13, 9, 10, 14, > > 11, 10, 9, 8, 9, 8, 8, 13, 14, 7, 10, 10, 13, 9, 7, 16, 12, 5, 4) > > areas = as.factor(rep(c("m1", "m2", "m3"), each=12)) > > #sites = as.factor(c(rep(c(1, 2, 3, 4), 3), rep(c(5, 6, 7, 8), 3), > > rep(c(9, 10, 11, 12), 3))) > > sites = as.factor(c(rep(c(1, 2, 3, 4), 9))) > > repl ?= as.factor(rep(c(1, 2, 3), each=4, 3)) > > > > summary(aov(y ~ areas + Error(areas%in%sites))) > > > > summary(aov(y ~ areas + Error(areas%in%sites))) > > Error: areas:sites > > ? ? ? ? ? Df Sum Sq Mean Sq F value Pr(>F) > > areas ? ? ?2 ? 4.50 ? ?2.25 ?0.1579 0.8563 > > Residuals ?9 128.25 ? 14.25 > > Error: Within > > ? ? ? ? ? Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 24 ? ?108 ? ? 4.5 > > Warning message: > > In aov(y ~ areas + Error(areas %in% sites)) : Error() model is singular > > > > > > > > sessionInfo() > > R version 2.13.1 Patched (2011-08-25 r56798) > > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > > > locale: > > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > > > attached base packages: > > [1] splines ? stats ? ? graphics ?grDevices utils ? ? datasets ?methods > > [8] base > > > > other attached packages: > > [1] car_2.0-11 ? ? ? ? survival_2.36-9 ? ?nnet_7.3-1 > > [4] MASS_7.3-14 ? ? ? ?lme4_0.999375-40 ? Matrix_0.999375-50 > > [7] lattice_0.19-33 ? ?nlme_3.1-102 > > > > loaded via a namespace (and not attached): > > [1] grid_2.13.1 ? stats4_2.13.1 tools_2.13.1 > > -- > > Marcus Nunes > > marcus.nunes at gmail.com > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- Marcus Nunes marcus.nunes at gmail.com From rshepard at appl-ecosys.com Wed Oct 5 16:51:49 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Wed, 5 Oct 2011 07:51:49 -0700 (PDT) Subject: [R] How to subset() from data frame using specific rows In-Reply-To: References: Message-ID: On Wed, 5 Oct 2011, Petr PIKAL wrote: > But you do not loose them, your data frame is cut according to sites > variable and put into a list I know this, Petr. But adding them to the database table ensures that the information is there, too. This brings up another question, but I should put that on a different thread. It's about process and work flow; that is, when can I use multiple factors in the original data frame and when I need to split and subset the data frame. I think it depends on how many factors can be specified by a particular model or graph. Regardless, I'll hold off on this as I work through these initial exploratory steps. Thanks, Rich From ggrothendieck at gmail.com Wed Oct 5 17:13:31 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Wed, 5 Oct 2011 11:13:31 -0400 Subject: [R] help with regexp In-Reply-To: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> References: <1317815812.96272.YahooMailClassic@web28215.mail.ukl.yahoo.com> Message-ID: On Wed, Oct 5, 2011 at 7:56 AM, Jannis wrote: > Dear list memebers, > > > I am stuck with using regular expressions. > > > Imagine I have a vector of character strings like: > > test <- c('filename_1_def.pdf', 'filename_2_abc.pdf') > > How could I use regexpressions to extract only the 'def'/'abc' parts of these strings? > > > Some try from my side yielded no results: > > testresults <- grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = TRUE, value = TRUE) > > Somehow I seem to miss some important concept here. Until now I always used nested sub expressions like: > > testresults <- sub('.pdf$', '', sub('^filename_[[:digit:]]_', '' , test)) > > > but this tends to become cumbersome and I was wondering whether there is a more elegant way to do this? > Here are a couple of solutions: # remove everything up to _b as well as everything from . onwards gsub(".*_|[.].*", "", test) # extract everything that is not a _ provided it is immediately followed by . library(gsubfn) strapply(test, "([^_]+)[.]", simplify = TRUE) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From wdunlap at tibco.com Wed Oct 5 18:08:36 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 16:08:36 +0000 Subject: [R] SPlus to R In-Reply-To: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: It looks like this code was written for S+ 4.5 (aka '2000') or before, which was based on S version 3. Try changing return(name1=value1, name2=value2) to return(list(name1=value1, name2=value2)) In S+ from 5.0 onwards return(name=value) or return(name1=value1, name2=value2) throws away the names and in R return only takes a single object (and also ignores the name). The c.search function in your code ends with return(ne=ne, Ep=Ep1) and the code calling c.search() acts as though the writer expects that function to return list(ne=ne, Ep=Ep1) ans <- c.searchd(nc, d, ne, alpha, power, cc, tol1) ... old.ne <- ans$ne Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Scott Raynaud > Sent: Tuesday, October 04, 2011 6:53 PM > To: r-help at r-project.org > Subject: [R] SPlus to R > > I'm trying to convert an S-Plus program to R.? Since I'm a SAS programmer I'm not facile is either S- > Plus or R, so I need some help.? All I did was convert the underscores in S-Plus to the assignment > operator <-.? Here are the first few lines of the S-Plus file: > > sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, > ???????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > return(ne=ne1) > ?????????????? } > > > My?translation looks like this: > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > ????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > { > ### for method 1 > if (method==1) { > ?ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > ?return(ne=ne1) > ?????????????? } > > The program runs without throwing errors, but I'm not getting any ourput in the console.? This is > where it should be, right?? I think I have this set up correctly.? I'm using method=3 which only > requires nc and d to be specified.? Any ideas why I'm not seeing output? > > Here is the entire output: > > > ## sshc.ssc: sample size calculation for historical control studies > > ## J. Jack Lee (jjlee at mdanderson.org) and Chi-hong Tseng > > ## Department of Biostatistics, Univ. of Texas M.D. Anderson Cancer Center > > ## > > ## 3/1/99 > > ## updated 6/7/00: add loess > > ##------------------------------------------------------------------ > > ######## Required Input: > > # > > # rc???? number of response in historical control group > > # nc???? sample size in historical control > > # d????? target improvement = Pe - Pc > > # method 1=method based on the randomized design > > #??????? 2=Makuch & Simon method (Makuch RW, Simon RM. Sample size considerations > > #????????? for non-randomized comparative studies. J of Chron Dis 1980; 3:175-181. > > #??????? 3=uniform power method > > ######## optional Input: > > # > > # alpha? size of the test > > # power? desired power of the test > > # tol??? convergence criterion for methods 1 & 2 in terms of sample size > > # tol1?? convergence criterion for method 3 at any given obs Rc in terms of difference > > #????????? of expected power from target > > # tol2?? overall convergence criterion for method 3 as the max absolute deviation > > #????????? of expected power from target for all Rc > > # cc???? range of multiplicative constant applied to the initial values ne > > # l.span smoothing constant for loess > > # > > # Note:? rc is required for methods 1 and 2 but not 3 > > #??????? method 3 return the sample size need for rc=0 to (1-d)*nc > > # > > ######## Output > > # for methdos 1 & 2: return the sample size needed for the experimental group (1 number) > > #??????????????????? for given rc, nc, d, alpha, and power > > # for method 3:????? return the profile of sample size needed for given nc, d, alpha, and power > > #??????????????????? vector $ne contains the sample size corresponding to rc=0, 1, 2, ... nc*(1-d) > > #??????????????????? vector $Ep contains the expected power corresponding to > > #????????????????????? the true pc = (0, 1, 2, ..., nc*(1-d)) / nc > > # > > #------------------------------------------------------------------ > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > +????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > + { > + ### for method 1 > + if (method==1) { > + ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > + return(ne=ne1) > +??????????????? } > + ### for method 2 > + if (method==2) { > + ne<-nc > + ne1<-nc+50 > + while(abs(ne-ne1)>tol & ne1<100000){ > + ne<-ne1 > + pe<-d+rc/nc > + ne1<-nef(rc,nc,pe*ne,ne,alpha,power) > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > + } > + if (ne1>100000) return(NA) > + else return(ne=ne1) > + } > + ### for method 3 > + if (method==3) { > + if (tol1 > tol2/10) tol1<-tol2/10 > + ncstar<-(1-d)*nc > + pc<-(0:ncstar)/nc > + ne<-rep(NA,ncstar + 1) > + for (i in (0:ncstar)) > + { ne[i+1]<-ss.rand(i,nc,d,alpha=.05,power=.8,tol=.01) > + } > + plot(pc,ne,type='l',ylim=c(0,max(ne)*1.5)) > + ans<-c.searchd(nc, d, ne, alpha, power, cc, tol1) > + ### check overall absolute deviance > + old.abs.dev<-sum(abs(ans$Ep-power)) > + ##bad<-0 > + print(round(ans$Ep,4)) > + print(round(ans$ne,2)) > + lines(pc,ans$ne,lty=1,col=8) > + old.ne<-ans$ne > + ##while(max(abs(ans$Ep-power))>tol2 & bad==0){? #### unnecessary ## > + while(max(abs(ans$Ep-power))>tol2){ > + ans<-c.searchd(nc, d, ans$ne, alpha, power, cc, tol1) > + abs.dev<-sum(abs(ans$Ep-power)) > + print(paste(" old.abs.dev=",old.abs.dev)) > + print(paste("???? abs.dev=",abs.dev)) > + ##if (abs.dev > old.abs.dev) { bad<-1} > + old.abs.dev<-abs.dev > + print(round(ans$Ep,4)) > + print(round(ans$ne,2)) > + lines(pc,old.ne,lty=1,col=1) > + lines(pc,ans$ne,lty=1,col=8) > + ### add convex > + ans$ne<-convex(pc,ans$ne)$wy > + ### add loess > + ###old.ne<-ans$ne > + loess.ne<-loess(ans$ne ~ pc, span=l.span) > + lines(pc,loess.ne$fit,lty=1,col=4) > + old.ne<-loess.ne$fit > + ###readline() > + } > + return(ne=ans$ne, Ep=ans$Ep) > +??????????????? } > + } > > > > ## needed for method 1 > > nef2<-function(rc,nc,re,ne,alpha,power){ > + za<-qnorm(1-alpha) > + zb<-qnorm(power) > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<- 1/(4*(xc-xe)^2/(za+zb)^2-1/(nc+0.5)) - 0.5 > + return(ans) > + } > > ## needed for method 2 > > nef<-function(rc,nc,re,ne,alpha,power){ > + za<-qnorm(1-alpha) > + zb<-qnorm(power) > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<-(za*sqrt(1+(ne+0.5)/(nc+0.5))+zb)^2/(2*(xe-xc))^2-0.5 > + return(ans) > + } > > ## needed for method 3 > > c.searchd<-function(nc, d, ne, alpha=0.05, power=0.8, cc=c(0.1,2),tol1=0.0001){ > + #--------------------------- > + # nc???? sample size of control group > + # d????? the differece to detect between control and experiment > + # ne???? vector of starting sample size of experiment group > + #??? corresonding to rc of 0 to nc*(1-d) > + # alpha? size of test > + # power? target power > + # cc? pre-screen vector of constant c, the range should cover the > + #??? the value of cc that has expected power > + # tol1?? the allowance between the expceted power and target power > + #--------------------------- > + pc<-(0:((1-d)*nc))/nc > + ncl<-length(pc) > + ne.old<-ne > + ne.old1<-ne.old > + ### sweeping forward > + for(i in 1:ncl){ > + cmin<-cc[1] > + cmax<-cc[2] > + ### fixed cci<-cmax bug > + cci <-1 > + lhood<-dbinom((i:ncl)-1,nc,pc[i]) > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > + Ep0 <-Epower(nc, d, ne, pc, alpha) > + while(abs(Ep0[i]-power)>tol1){ > + if(Ep0[i] + else cmax<-cci > + cci<-(cmax+cmin)/2 > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + } > +? ne.old1<-ne > + } > + ne1<-ne > + ### sweeping backward -- ncl:i > + ne.old2<-ne.old > + ne???? <-ne.old > + for(i in ncl:1){ > + cmin<-cc[1] > + cmax<-cc[2] > + ### fixed cci<-cmax bug > + cci <-1 > + lhood<-dbinom((ncl:i)-1,nc,pc[i]) > + lenl <-length(lhood) > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > + Ep0 <-Epower(nc, d, cci*ne, pc, alpha) > + while(abs(Ep0[i]-power)>tol1){ > + if(Ep0[i] + else cmax<-cci > + cci<-(cmax+cmin)/2 > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > + Ep0<-Epower(nc, d, ne, pc, alpha) > + } > +? ne.old2<-ne > + } > + ne2<-ne > + ne<-(ne1+ne2)/2 > + #cat(ccc*ne) > + Ep1<-Epower(nc, d, ne, pc, alpha) > + return(ne=ne, Ep=Ep1) > + } > > ### > > vertex<-function(x,y) > + { n<-length(x) > + vx<-x[1] > + vy<-y[1] > + vp<-1 > + up<-T > + for (i in (2:n)) > + { if (up) > + { if (y[i-1] > y[i]) > + {vx<-c(vx,x[i-1]) > +? vy<-c(vy,y[i-1]) > +? vp<-c(vp,i-1) > +? up<-F > + } > + } > + else > + { if (y[i-1] < y[i]) up<-T > + } > + } > + vx<-c(vx,x[n]) > + vy<-c(vy,y[n]) > + vp<-c(vp,n) > + return(vx=vx,vy=vy,vp=vp) > + } > > ### > > convex<-function(x,y) > + { > + n<-length(x) > + ans<-vertex(x,y) > + len<-length(ans$vx) > + while (len>3) > + { > + #cat("x=",x,"\n") > + #cat("y=",y,"\n") > + newx<-x[1:(ans$vp[2]-1)] > + newy<-y[1:(ans$vp[2]-1)] > + for (i in (2:(len-1))) > + { > +? newx<-c(newx,x[ans$vp[i]]) > + newy<-c(newy,y[ans$vp[i]]) > + } > + newx<-c(newx,x[(ans$vp[len-1]+1):n]) > + newy<-c(newy,y[(ans$vp[len-1]+1):n]) > + y<-approx(newx,newy,xout=x)$y > + #cat("new y=",y,"\n") > + ans<-vertex(x,y) > + len<-length(ans$vx) > + #cat("vx=",ans$vx,"\n") > + #cat("vy=",ans$vy,"\n") > + } > + return(wx=x,wy=y)} > > ### > > Epower<-function(nc, d, ne, pc = (0:((1 - d) * nc))/nc, alpha = 0.05) > + { > + #------------------------------------- > + # nc???? sample size in historical control > + # d????? the increase of response rate between historical and experiment > + # ne???? sample size of corresonding rc of 0 to nc*(1-d) > + # pc???? the response rate of control group, where we compute the > + #??????? expected power > + # alpha? the size of test > + #------------------------------------- > + kk <- length(pc) > + rc <- 0:(nc * (1 - d)) > + pp <- rep(NA, kk) > + ppp <- rep(NA, kk) > + for(i in 1:(kk)) { > + pe <- pc[i] + d > + lhood <- dbinom(rc, nc, pc[i]) > + pp <- power1.f(rc, nc, ne, pe, alpha) > + ppp[i] <- sum(pp * lhood)/sum(lhood) > + } > + return(ppp) > + } > > > > # adapted from the old biss2 > > ss.rand<-function(rc,nc,d,alpha=.05,power=.8,tol=.01) > + { > + ne<-nc > + ne1<-nc+50 > + while(abs(ne-ne1)>tol & ne1<100000){ > + ne<-ne1 > + pe<-d+rc/nc > + ne1<-nef2(rc,nc,pe*ne,ne,alpha,power) > + > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > + } > + if (ne1>100000) return(NA) > + else return(ne1) > + } > > ### > > power1.f<-function(rc,nc,ne,pie,alpha=0.05){ > + #------------------------------------- > + # rcnumber of response in historical control > + # ncsample size in historical control > + # ne??? sample size in experitment group > + # pietrue response rate for experiment group > + # alphasize of the test > + #------------------------------------- > + > + za<-qnorm(1-alpha) > + re<-ne*pie > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > + ans<-za*sqrt(1+(ne+0.5)/(nc+0.5))-(xe-xc)/sqrt(1/(4*(ne+0.5))) > + return(1-pnorm(ans)) > + } > > [[alternative HTML version deleted]] From fernando.cabrera at nordea.com Wed Oct 5 18:11:18 2011 From: fernando.cabrera at nordea.com (fernando.cabrera at nordea.com) Date: Wed, 5 Oct 2011 18:11:18 +0200 Subject: [R] Populate a matrix Message-ID: Hi guys I have vectors x <- c(1,2,3,4) and y <- c(4,3,9) and would like to generate a matrix which has 3 rows (length(y)) and 4 columns (length(x)), and each row is the corresponding y element repeated length(x) times. 4,4,4,4 3,3,3,3 9,9,9,9 Thanks. Fernando ?lvarez From Samuel.Le at srlglobal.com Wed Oct 5 18:15:27 2011 From: Samuel.Le at srlglobal.com (Samuel Le) Date: Wed, 5 Oct 2011 16:15:27 +0000 Subject: [R] Populate a matrix In-Reply-To: References: Message-ID: mat <- matrix(ncol = length(x), nrow = length(y)) for(i in 1:length(x)) { mat[,i] = y} HTH, Samuel -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of fernando.cabrera at nordea.com Sent: 05 October 2011 17:11 To: r-help at r-project.org Subject: [R] Populate a matrix Hi guys I have vectors x <- c(1,2,3,4) and y <- c(4,3,9) and would like to generate a matrix which has 3 rows (length(y)) and 4 columns (length(x)), and each row is the corresponding y element repeated length(x) times. 4,4,4,4 3,3,3,3 9,9,9,9 Thanks. Fernando ?lvarez ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __________ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From michael.weylandt at gmail.com Wed Oct 5 18:17:29 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 12:17:29 -0400 Subject: [R] Populate a matrix In-Reply-To: References: Message-ID: matrix(rep(y, each=length(x)), nrow=length(y), byrow=TRUE) or less explicitly matrix(y, nrow=length(y),ncol=length(x)) Michael On Wed, Oct 5, 2011 at 12:11 PM, wrote: > Hi guys > > I have vectors x <- c(1,2,3,4) and y <- c(4,3,9) and would like to generate a matrix which has 3 rows (length(y)) and 4 columns (length(x)), and each row is the corresponding y element repeated length(x) times. > > 4,4,4,4 > 3,3,3,3 > 9,9,9,9 > > Thanks. > > Fernando ?lvarez > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From batholdy at googlemail.com Wed Oct 5 18:20:29 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Wed, 5 Oct 2011 18:20:29 +0200 Subject: [R] converting 3D array to a data-frame (with coordinate-columns x, y, z) Message-ID: <12BF49A5-E7D4-4CF2-8A6B-4F5CC4496181@googlemail.com> Hi, I am still struggling with three dimensional arrays. Now I would like to convert a three dimensional array into a data-frame with the coordinate-columns: x, y, z and a value-column. And I definitely don't want to loop over every element, since this would be very resource intensive for the actual data-set. Are there any specific functions that are helpful for this task? example-array: x <- array(1:27, dim=c(3,3,3,1)) thanks! From michael.weylandt at gmail.com Wed Oct 5 18:21:15 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 12:21:15 -0400 Subject: [R] Populate a matrix In-Reply-To: References: Message-ID: One more version: somewhere in the middle of the explicitness scale, matrix(rep(y, times = length(x)), nrow=length(y)) On Wed, Oct 5, 2011 at 12:17 PM, R. Michael Weylandt wrote: > matrix(rep(y, each=length(x)), nrow=length(y), byrow=TRUE) > or less explicitly > ?matrix(y, nrow=length(y),ncol=length(x)) > Michael > > On Wed, Oct 5, 2011 at 12:11 PM, ? wrote: >> Hi guys >> >> I have vectors x <- c(1,2,3,4) and y <- c(4,3,9) and would like to generate a matrix which has 3 rows (length(y)) and 4 columns (length(x)), and each row is the corresponding y element repeated length(x) times. >> >> 4,4,4,4 >> 3,3,3,3 >> 9,9,9,9 >> >> Thanks. >> >> Fernando ?lvarez >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > From michael.weylandt at gmail.com Wed Oct 5 18:23:40 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 12:23:40 -0400 Subject: [R] converting 3D array to a data-frame (with coordinate-columns x, y, z) In-Reply-To: <12BF49A5-E7D4-4CF2-8A6B-4F5CC4496181@googlemail.com> References: <12BF49A5-E7D4-4CF2-8A6B-4F5CC4496181@googlemail.com> Message-ID: reshape::melt does this I think Michael On Wed, Oct 5, 2011 at 12:20 PM, Martin Batholdy wrote: > Hi, > > > I am still struggling with three dimensional arrays. > > Now I would like to convert a three dimensional array into a data-frame with the coordinate-columns: x, y, z and a value-column. > > And I definitely don't want to loop over every element, since this would be very resource intensive for the actual data-set. > > > Are there any specific functions that are helpful for this task? > > > > example-array: > > x <- array(1:27, dim=c(3,3,3,1)) > > > > thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Wed Oct 5 18:24:31 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 09:24:31 -0700 Subject: [R] mean of 3D arrays In-Reply-To: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> References: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> Message-ID: Hi: There are a few ways to do this. If you only have a few arrays, you can simply add them and divide by the number of arrays. If you have a large number of such arrays, this is inconvenient, so an alternative is to ship the arrays into a list and use the Reduce() function. For your example, L <- list(x1, x2, x3) Reduce('+', L)/length(L) would work. If you have many such arrays in separate files, you can always use lapply() in conjunction with a suitable read function with an input vector that contains the file names to be read, of the general form L <- lapply(, ) with the idea that the read function passes arrays into the list components. Here's a simple toy example with three small matrices to illustrate proof of concept: > t1 <- matrix(1:9, nrow = 3) > t2 <- matrix(-4:4, nrow = 3) > t3 <- matrix(-3:5, nrow = 3) > (t1 + t2 + t3)/3 [,1] [,2] [,3] [1,] -2 1 4 [2,] -1 2 5 [3,] 0 3 6 > Reduce('+', list(t1, t2, t3))/3 [,1] [,2] [,3] [1,] -2 1 4 [2,] -1 2 5 [3,] 0 3 6 HTH, Dennis On Wed, Oct 5, 2011 at 6:00 AM, Martin Batholdy wrote: > Hi, > > I have multiple three dimensional arrays. > > Like this: > > x1 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) > x2 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) > x3 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) > > > Now I would like to compute the mean for each corresponding cell. > As a result I want to get one 3D array (10 x 10 x 10) in which at position x, y, z is the mean of the corresponding values of x1, x2 and x3 (at position x, y, z). > > > How can I do this? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From wdunlap at tibco.com Wed Oct 5 18:25:03 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 16:25:03 +0000 Subject: [R] SPlus to R In-Reply-To: References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: I think you only have to change the multi-argument returns to call list. You can remove the name from the single argument return, as it will be ignore return(name=value) -> return(value) return(n1=v1, n2=v2) -> return(list(n1=v1, n2=v2)) (I say "I think" because I don't have easy access to S+ 4.5, from 1999 or so, to check this out.) Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap > Sent: Wednesday, October 05, 2011 9:09 AM > To: Scott Raynaud; r-help at r-project.org > Subject: Re: [R] SPlus to R > > It looks like this code was written for S+ 4.5 (aka '2000') > or before, which was based on S version 3. Try changing > return(name1=value1, name2=value2) > to > return(list(name1=value1, name2=value2)) > In S+ from 5.0 onwards return(name=value) or return(name1=value1, > name2=value2) throws away the names and in R return only takes > a single object (and also ignores the name). > > The c.search function in your code ends with > return(ne=ne, Ep=Ep1) > and the code calling c.search() acts as though the writer > expects that function to return list(ne=ne, Ep=Ep1) > ans <- c.searchd(nc, d, ne, alpha, power, cc, tol1) > ... > old.ne <- ans$ne > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > -----Original Message----- > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Scott Raynaud > > Sent: Tuesday, October 04, 2011 6:53 PM > > To: r-help at r-project.org > > Subject: [R] SPlus to R > > > > I'm trying to convert an S-Plus program to R.? Since I'm a SAS programmer I'm not facile is either > S- > > Plus or R, so I need some help.? All I did was convert the underscores in S-Plus to the assignment > > operator <-.? Here are the first few lines of the S-Plus file: > > > > sshc _ function(rc, nc, d, method, alpha=0.05, power=0.8, > > ???????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > > { > > ### for method 1 > > if (method==1) { > > ne1 _ ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > > return(ne=ne1) > > ?????????????? } > > > > > > My?translation looks like this: > > > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > > ????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > > { > > ### for method 1 > > if (method==1) { > > ?ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > > ?return(ne=ne1) > > ?????????????? } > > > > The program runs without throwing errors, but I'm not getting any ourput in the console.? This is > > where it should be, right?? I think I have this set up correctly.? I'm using method=3 which only > > requires nc and d to be specified.? Any ideas why I'm not seeing output? > > > > Here is the entire output: > > > > > ## sshc.ssc: sample size calculation for historical control studies > > > ## J. Jack Lee (jjlee at mdanderson.org) and Chi-hong Tseng > > > ## Department of Biostatistics, Univ. of Texas M.D. Anderson Cancer Center > > > ## > > > ## 3/1/99 > > > ## updated 6/7/00: add loess > > > ##------------------------------------------------------------------ > > > ######## Required Input: > > > # > > > # rc???? number of response in historical control group > > > # nc???? sample size in historical control > > > # d????? target improvement = Pe - Pc > > > # method 1=method based on the randomized design > > > #??????? 2=Makuch & Simon method (Makuch RW, Simon RM. Sample size considerations > > > #????????? for non-randomized comparative studies. J of Chron Dis 1980; 3:175-181. > > > #??????? 3=uniform power method > > > ######## optional Input: > > > # > > > # alpha? size of the test > > > # power? desired power of the test > > > # tol??? convergence criterion for methods 1 & 2 in terms of sample size > > > # tol1?? convergence criterion for method 3 at any given obs Rc in terms of difference > > > #????????? of expected power from target > > > # tol2?? overall convergence criterion for method 3 as the max absolute deviation > > > #????????? of expected power from target for all Rc > > > # cc???? range of multiplicative constant applied to the initial values ne > > > # l.span smoothing constant for loess > > > # > > > # Note:? rc is required for methods 1 and 2 but not 3 > > > #??????? method 3 return the sample size need for rc=0 to (1-d)*nc > > > # > > > ######## Output > > > # for methdos 1 & 2: return the sample size needed for the experimental group (1 number) > > > #??????????????????? for given rc, nc, d, alpha, and power > > > # for method 3:????? return the profile of sample size needed for given nc, d, alpha, and power > > > #??????????????????? vector $ne contains the sample size corresponding to rc=0, 1, 2, ... nc*(1-d) > > > #??????????????????? vector $Ep contains the expected power corresponding to > > > #????????????????????? the true pc = (0, 1, 2, ..., nc*(1-d)) / nc > > > # > > > #------------------------------------------------------------------ > > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > > +????????????? tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > > + { > > + ### for method 1 > > + if (method==1) { > > + ne1<-ss.rand(rc,nc,d,alpha=.05,power=.8,tol=.01) > > + return(ne=ne1) > > +??????????????? } > > + ### for method 2 > > + if (method==2) { > > + ne<-nc > > + ne1<-nc+50 > > + while(abs(ne-ne1)>tol & ne1<100000){ > > + ne<-ne1 > > + pe<-d+rc/nc > > + ne1<-nef(rc,nc,pe*ne,ne,alpha,power) > > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > > + } > > + if (ne1>100000) return(NA) > > + else return(ne=ne1) > > + } > > + ### for method 3 > > + if (method==3) { > > + if (tol1 > tol2/10) tol1<-tol2/10 > > + ncstar<-(1-d)*nc > > + pc<-(0:ncstar)/nc > > + ne<-rep(NA,ncstar + 1) > > + for (i in (0:ncstar)) > > + { ne[i+1]<-ss.rand(i,nc,d,alpha=.05,power=.8,tol=.01) > > + } > > + plot(pc,ne,type='l',ylim=c(0,max(ne)*1.5)) > > + ans<-c.searchd(nc, d, ne, alpha, power, cc, tol1) > > + ### check overall absolute deviance > > + old.abs.dev<-sum(abs(ans$Ep-power)) > > + ##bad<-0 > > + print(round(ans$Ep,4)) > > + print(round(ans$ne,2)) > > + lines(pc,ans$ne,lty=1,col=8) > > + old.ne<-ans$ne > > + ##while(max(abs(ans$Ep-power))>tol2 & bad==0){? #### unnecessary ## > > + while(max(abs(ans$Ep-power))>tol2){ > > + ans<-c.searchd(nc, d, ans$ne, alpha, power, cc, tol1) > > + abs.dev<-sum(abs(ans$Ep-power)) > > + print(paste(" old.abs.dev=",old.abs.dev)) > > + print(paste("???? abs.dev=",abs.dev)) > > + ##if (abs.dev > old.abs.dev) { bad<-1} > > + old.abs.dev<-abs.dev > > + print(round(ans$Ep,4)) > > + print(round(ans$ne,2)) > > + lines(pc,old.ne,lty=1,col=1) > > + lines(pc,ans$ne,lty=1,col=8) > > + ### add convex > > + ans$ne<-convex(pc,ans$ne)$wy > > + ### add loess > > + ###old.ne<-ans$ne > > + loess.ne<-loess(ans$ne ~ pc, span=l.span) > > + lines(pc,loess.ne$fit,lty=1,col=4) > > + old.ne<-loess.ne$fit > > + ###readline() > > + } > > + return(ne=ans$ne, Ep=ans$Ep) > > +??????????????? } > > + } > > > > > > ## needed for method 1 > > > nef2<-function(rc,nc,re,ne,alpha,power){ > > + za<-qnorm(1-alpha) > > + zb<-qnorm(power) > > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > > + ans<- 1/(4*(xc-xe)^2/(za+zb)^2-1/(nc+0.5)) - 0.5 > > + return(ans) > > + } > > > ## needed for method 2 > > > nef<-function(rc,nc,re,ne,alpha,power){ > > + za<-qnorm(1-alpha) > > + zb<-qnorm(power) > > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > > + ans<-(za*sqrt(1+(ne+0.5)/(nc+0.5))+zb)^2/(2*(xe-xc))^2-0.5 > > + return(ans) > > + } > > > ## needed for method 3 > > > c.searchd<-function(nc, d, ne, alpha=0.05, power=0.8, cc=c(0.1,2),tol1=0.0001){ > > + #--------------------------- > > + # nc???? sample size of control group > > + # d????? the differece to detect between control and experiment > > + # ne???? vector of starting sample size of experiment group > > + #??? corresonding to rc of 0 to nc*(1-d) > > + # alpha? size of test > > + # power? target power > > + # cc? pre-screen vector of constant c, the range should cover the > > + #??? the value of cc that has expected power > > + # tol1?? the allowance between the expceted power and target power > > + #--------------------------- > > + pc<-(0:((1-d)*nc))/nc > > + ncl<-length(pc) > > + ne.old<-ne > > + ne.old1<-ne.old > > + ### sweeping forward > > + for(i in 1:ncl){ > > + cmin<-cc[1] > > + cmax<-cc[2] > > + ### fixed cci<-cmax bug > > + cci <-1 > > + lhood<-dbinom((i:ncl)-1,nc,pc[i]) > > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > > + Ep0 <-Epower(nc, d, ne, pc, alpha) > > + while(abs(Ep0[i]-power)>tol1){ > > + if(Ep0[i] > + else cmax<-cci > > + cci<-(cmax+cmin)/2 > > + ne[i:ncl]<-(1+(cci-1)*(lhood/lhood[1])) * ne.old1[i:ncl] > > + Ep0<-Epower(nc, d, ne, pc, alpha) > > + } > > +? ne.old1<-ne > > + } > > + ne1<-ne > > + ### sweeping backward -- ncl:i > > + ne.old2<-ne.old > > + ne???? <-ne.old > > + for(i in ncl:1){ > > + cmin<-cc[1] > > + cmax<-cc[2] > > + ### fixed cci<-cmax bug > > + cci <-1 > > + lhood<-dbinom((ncl:i)-1,nc,pc[i]) > > + lenl <-length(lhood) > > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > > + Ep0 <-Epower(nc, d, cci*ne, pc, alpha) > > + while(abs(Ep0[i]-power)>tol1){ > > + if(Ep0[i] > + else cmax<-cci > > + cci<-(cmax+cmin)/2 > > + ne[ncl:i]<-(1+(cci-1)*(lhood/lhood[lenl]))*ne.old2[ncl:i] > > + Ep0<-Epower(nc, d, ne, pc, alpha) > > + } > > +? ne.old2<-ne > > + } > > + ne2<-ne > > + ne<-(ne1+ne2)/2 > > + #cat(ccc*ne) > > + Ep1<-Epower(nc, d, ne, pc, alpha) > > + return(ne=ne, Ep=Ep1) > > + } > > > ### > > > vertex<-function(x,y) > > + { n<-length(x) > > + vx<-x[1] > > + vy<-y[1] > > + vp<-1 > > + up<-T > > + for (i in (2:n)) > > + { if (up) > > + { if (y[i-1] > y[i]) > > + {vx<-c(vx,x[i-1]) > > +? vy<-c(vy,y[i-1]) > > +? vp<-c(vp,i-1) > > +? up<-F > > + } > > + } > > + else > > + { if (y[i-1] < y[i]) up<-T > > + } > > + } > > + vx<-c(vx,x[n]) > > + vy<-c(vy,y[n]) > > + vp<-c(vp,n) > > + return(vx=vx,vy=vy,vp=vp) > > + } > > > ### > > > convex<-function(x,y) > > + { > > + n<-length(x) > > + ans<-vertex(x,y) > > + len<-length(ans$vx) > > + while (len>3) > > + { > > + #cat("x=",x,"\n") > > + #cat("y=",y,"\n") > > + newx<-x[1:(ans$vp[2]-1)] > > + newy<-y[1:(ans$vp[2]-1)] > > + for (i in (2:(len-1))) > > + { > > +? newx<-c(newx,x[ans$vp[i]]) > > + newy<-c(newy,y[ans$vp[i]]) > > + } > > + newx<-c(newx,x[(ans$vp[len-1]+1):n]) > > + newy<-c(newy,y[(ans$vp[len-1]+1):n]) > > + y<-approx(newx,newy,xout=x)$y > > + #cat("new y=",y,"\n") > > + ans<-vertex(x,y) > > + len<-length(ans$vx) > > + #cat("vx=",ans$vx,"\n") > > + #cat("vy=",ans$vy,"\n") > > + } > > + return(wx=x,wy=y)} > > > ### > > > Epower<-function(nc, d, ne, pc = (0:((1 - d) * nc))/nc, alpha = 0.05) > > + { > > + #------------------------------------- > > + # nc???? sample size in historical control > > + # d????? the increase of response rate between historical and experiment > > + # ne???? sample size of corresonding rc of 0 to nc*(1-d) > > + # pc???? the response rate of control group, where we compute the > > + #??????? expected power > > + # alpha? the size of test > > + #------------------------------------- > > + kk <- length(pc) > > + rc <- 0:(nc * (1 - d)) > > + pp <- rep(NA, kk) > > + ppp <- rep(NA, kk) > > + for(i in 1:(kk)) { > > + pe <- pc[i] + d > > + lhood <- dbinom(rc, nc, pc[i]) > > + pp <- power1.f(rc, nc, ne, pe, alpha) > > + ppp[i] <- sum(pp * lhood)/sum(lhood) > > + } > > + return(ppp) > > + } > > > > > > # adapted from the old biss2 > > > ss.rand<-function(rc,nc,d,alpha=.05,power=.8,tol=.01) > > + { > > + ne<-nc > > + ne1<-nc+50 > > + while(abs(ne-ne1)>tol & ne1<100000){ > > + ne<-ne1 > > + pe<-d+rc/nc > > + ne1<-nef2(rc,nc,pe*ne,ne,alpha,power) > > + > > + ## if(is.na(ne1)) print(paste('rc=',rc,',nc=',nc,',pe=',pe,',ne=',ne)) > > + } > > + if (ne1>100000) return(NA) > > + else return(ne1) > > + } > > > ### > > > power1.f<-function(rc,nc,ne,pie,alpha=0.05){ > > + #------------------------------------- > > + # rcnumber of response in historical control > > + # ncsample size in historical control > > + # ne??? sample size in experitment group > > + # pietrue response rate for experiment group > > + # alphasize of the test > > + #------------------------------------- > > + > > + za<-qnorm(1-alpha) > > + re<-ne*pie > > + xe<-asin(sqrt((re+0.375)/(ne+0.75))) > > + xc<-asin(sqrt((rc+0.375)/(nc+0.75))) > > + ans<-za*sqrt(1+(ne+0.5)/(nc+0.5))-(xe-xc)/sqrt(1/(4*(ne+0.5))) > > + return(1-pnorm(ans)) > > + } > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From rainer.schuermann at gmx.net Wed Oct 5 18:29:12 2011 From: rainer.schuermann at gmx.net (Rainer Schuermann) Date: Wed, 05 Oct 2011 18:29:12 +0200 Subject: [R] Populate a matrix In-Reply-To: References: Message-ID: <2885999.ONQgauKSmv@augeatur> m <- matrix( rep( y, length( x ) ), length( y ), length( x ) ) On Wednesday 05 October 2011 18:11:18 fernando.cabrera at nordea.com wrote: > Hi guys > > I have vectors x <- c(1,2,3,4) and y <- c(4,3,9) and would like to generate a matrix which has 3 rows (length(y)) and 4 columns (length(x)), and each row is the corresponding y element repeated length(x) times. > > 4,4,4,4 > 3,3,3,3 > 9,9,9,9 > > Thanks. > > Fernando ?lvarez > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From wdunlap at tibco.com Wed Oct 5 18:44:08 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 16:44:08 +0000 Subject: [R] Vector-subsetting with ZERO - Is behavior changeable? In-Reply-To: References: Message-ID: You can use [1] on the output of FUN to ensure that exactly one value (perhaps NA from numeric(0)[1]) is returned. E.g. > index <- 1 > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)][1]}) [1] 2 1 NA I'll also put in a plug for vapply, which throws an error if FUN does not return what you expect it to: > vapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}, FUN.VALUE=numeric(1)) Error in vapply(list(c(1, 2, 3), c(1, 2), c(1)), function(x) { : values must be length 1, but FUN(X[[3]]) result is length 0 > vapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)][1]}, FUN.VALUE=numeric(1)) [1] 2 1 NA For long input vectors vapply can save a fair bit of memory and time over sapply. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Johannes > Graumann > Sent: Wednesday, October 05, 2011 4:29 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Vector-subsetting with ZERO - Is behavior changeable? > > Dear All, > > I have trouble generizising some code. > > > index <- 0 > > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}) > Will yield a wished for vector like so: > [1] 3 2 1 > > But in this case (trying to select te second to last element in each vector > of the list) > > index <- 1 > > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}) > I end up with > [[1]] > [1] 2 > > [[2]] > [1] 1 > > [[3]] > numeric(0) > > I would (massively) prefer something like > [1] 2 1 NA > > My current implementation looks like > > index <- 1 > > unlist( > > sapply( > > list(c(1,2,3),c(1,2),c(1)), > > function(x){ > > value <- x[max(length(x)-index,0)] > > if(identical(value,numeric(0))){return(NA)} else {return(value)} > > } > > ) > > ) > [1] 2 1 NA > > Quite the inelegant eyesore. > > Any hints on how to do this better? > > Thanks, Joh > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From tal.galili at gmail.com Wed Oct 5 18:53:23 2011 From: tal.galili at gmail.com (Tal Galili) Date: Wed, 5 Oct 2011 18:53:23 +0200 Subject: [R] Does it exist a function for this? In-Reply-To: <1317804140079-3873827.post@n4.nabble.com> References: <1317804140079-3873827.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pauljohn32 at gmail.com Wed Oct 5 19:26:38 2011 From: pauljohn32 at gmail.com (Paul Johnson) Date: Wed, 5 Oct 2011 12:26:38 -0500 Subject: [R] Running a GMM Estimation on dynamic Panel Model using plm-Package In-Reply-To: <1307907816544-3592466.post@n4.nabble.com> References: <1307907816544-3592466.post@n4.nabble.com> Message-ID: On Sun, Jun 12, 2011 at 2:43 PM, bstudent wrote: > Hello, > > although I searched for a solution related to my problem I didn?t find one, > yet. My skills in R aren?t very large, however. > For my Diploma thesis I need to run a GMM estimation on a dynamic panel > model using the "pgmm" - function in the plm-Package. > > The model I want to estimate is: "Y(t) = Y(t-1) + X1(t) + X2(t) + X3(t)" . > > There are no "normal" instruments in this model. There just should be the > "gmm-instruments" I need for the model. > In order to estimate it, I tried the following code: > >> >> library(plm) >> >> test <- pgmm(Y ~ lag(Y, 1) + X1 + X2 + X3 | lag(Y, 1), data=Model, >> effect="individual", model="onestep") >> >> > > I tried "Model" as "Modelp <- pdata.frame(..." and as "Model <- > read.table(..." but in both cases there?s an error-massage: > > Error in solve.default(Reduce("+", A2)) : > ?System ist f?r den Rechner singul?r: reziproke Konditionszahl = > 4.08048e-22 > Hello, I have students working on similar problems. Here is what I would say to them: Without a dataset and code that is supposed to work, nobody can figure out what's wrong and help you around it. 2 suggestions 1. directly contact Yves Croissant, the plm principal author, and give him your R code and the data set. Show him the error output you get. Here's the contact information: Yves Croissant If he answers, please let us know. If you don't want to (or can't) give real data, make some up that causes the same crash. 2. post in here a link to your data and the full code and I will try to debug it to at least find out where this is going wrong. I've been studying debugging with R functions and this is a good opportunity for me. I stopped focusing on panel estimator details in 2000, so I'm rusty, but will probably recognize most of what is going on. If you don't want to broadcast this to everybody, uou can feel free to contact me directly, pauljohn at ku.edu is my university address. PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas From pauljohn32 at gmail.com Wed Oct 5 19:29:53 2011 From: pauljohn32 at gmail.com (Paul Johnson) Date: Wed, 5 Oct 2011 12:29:53 -0500 Subject: [R] Tinn-R In-Reply-To: <009201cc82c3$04424fb0$7800a8c0@ATRPDC.ATRColumbia.Com> References: <009201cc82c3$04424fb0$7800a8c0@ATRPDC.ATRColumbia.Com> Message-ID: On Tue, Oct 4, 2011 at 1:25 PM, Charles McClure wrote: > I am new to R and have recently tried Tinn-R with very mixed and unexpected > results. ?Can you point me to a Tinn-R tutorial on the web or a decent > reference book? > In my experience, TINN-R does not work so well, and most new users are recommended to try instead Notepad++ with the addon components R2notepad++ or Rstudio. I have MS Windows setup tips here http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Windows:AdminTips Until I see evidence otherwise, I'm concluding that TINN-R was the best in 2008, but it is harder to configure now and there's no reason to prefer it over Notepad++. "Real Men"[tm] still use Emacs, but new users may not have enough muscles :) pj > Thank you for your help; > > Charles McClure -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas From batholdy at googlemail.com Wed Oct 5 19:39:35 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Wed, 5 Oct 2011 19:39:35 +0200 Subject: [R] speed up this algorithm (apply-fuction / 4D array) Message-ID: <4C6DA2DA-1ABB-433C-BEFE-57F66689ED17@googlemail.com> Hi, I have this sample-code (see above) and I was wondering wether it is possible to speed things up. What this code does is the following: x is 4D array (you can imagine it as x, y, z-coordinates and a time-coordinate). So x contains 50x50x50 data-arrays for 91 time-points. Now I want to reduce the 91 time-points. I want to merge three consecutive time points to one time-points by calculating the mean of this three time-points for every x,y,z coordinate. The reduce-sequence defines which time-points should get merged. And the apply-function in the for-loop calculates the mean of the three 3D-Arrays and puts them into a new 4D array (data_reduced). The problem is that even in this example it takes really long. I thought apply would already vectorize, rather than loop over every coordinate. But for my actual data-set it takes a really long time ? So I would be really grateful for any suggestions how to speed this up. x <- array(rnorm(50 * 50 * 50 * 90, 0, 2), dim=c(50, 50, 50, 91)) data_reduced <- array(0, dim=c(50, 50, 50, 90/3)) reduce <- seq(1,90, 3) for( i in 1:length(reduce) ) { data_reduced[ , , , i] <- apply(x[ , , , reduce[i] : (reduce[i]+3) ], 1:3, mean) } From ahoerner at rprogress.org Wed Oct 5 19:45:17 2011 From: ahoerner at rprogress.org (andrewH) Date: Wed, 5 Oct 2011 10:45:17 -0700 (PDT) Subject: [R] reporting multiple objects out of a function In-Reply-To: <4E8C1D79.8080205@knmi.nl> References: <1317788874982-3873380.post@n4.nabble.com> <4E8C1D79.8080205@knmi.nl> Message-ID: <1317836717749-3875586.post@n4.nabble.com> Thanks for the response, Paul! But I thought these dumped the variables into the global environment. Is that not correct? I want to make them available in the calling environment, without making them available in the global environment, unless that is where the function is called. This is my bow to the fact that what I want this function to do is not good programming practice in general. The whole purpose of this function is to save me time, typing and wear on my limited short-term memory capacity, by having standard objects with standard names quickly available. I wonder if eval.parent would do the job. Like: fun1 <- function(x, y, z) eval.parent{obj1 <- x; obj2 <- y; obj3 <- z }) Or does that just use the parent environment for the inputs, not the output? Part of my problem is that I am not sure how to tell if I have succeeded. Otherwise I would just test it myself. andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3875586.html Sent from the R help mailing list archive at Nabble.com. From leandromarino at leandromarino.com.br Wed Oct 5 19:45:31 2011 From: leandromarino at leandromarino.com.br (Leandro Marino) Date: Wed, 5 Oct 2011 14:45:31 -0300 Subject: [R] Tinn-R In-Reply-To: References: <009201cc82c3$04424fb0$7800a8c0@ATRPDC.ATRColumbia.Com> Message-ID: Um texto embutido e sem conjunto de caracteres especificado foi limpo... Nome: n?o dispon?vel Url: From christopher.a.hane at gmail.com Wed Oct 5 19:55:08 2011 From: christopher.a.hane at gmail.com (Chris Hane) Date: Wed, 5 Oct 2011 10:55:08 -0700 Subject: [R] Party extract BinaryTree from cforest? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From chrismcowen at gmail.com Wed Oct 5 19:55:14 2011 From: chrismcowen at gmail.com (Chris Mcowen) Date: Wed, 5 Oct 2011 18:55:14 +0100 Subject: [R] Advice in model construction Message-ID: Dear list, I am unsure how to structure my model, i have tried something and it makes sense but i am unsure if i am interpreting it correctly? i have a continuous response variable - the observed quantity of evolutionary history - EH Then i have a number of species which have a hierarchical structure ~ Genus, Family etc My research question is do certain families have significantly higher ( or lower) EH values than the others. Reproducible example: example <- structure(list(Family = structure(c(2L, 1L, 1L, 5L, 7L, 7L, 3L, 4L, 6L, 6L, 1L, 3L), .Label = c("Araceae", "Asphodelaceae", "Bromeliaceae", "Cyperaceae", "Orchidaceae", "Poaceae", "Zingiberaceae"), class = "factor"), Genus = structure(c(3L, 4L, 4L, 1L, 6L, 6L, 2L, 5L, 8L, 9L, 4L, 7L), .Label = c("Acianthera", "Aechmea", "Aloe", "Anthurium", "Bulbostylis", "Hedychium", "Lindmania", "Psathyrostachys", "Sesleria"), class = "factor"), Species = structure(c(9L, 1L, 10L, 11L, 7L, 4L, 3L, 5L, 8L, 2L, 6L, 12L), .Label = c("bonplandii", "coerulans", "cymosopaniculata", "elatum", "emmerichiae", "gehrigeri", "glabrum", "juncea", "pubescens", "sagittatum", "scalpricaulis", "sessilis"), class = "factor"), EH = c(8.746525, 24.462699, 33.03942, 32.719489, 13.598201, 13.598201, 13.164928, 9.339228, 9.69705, 13.478372, 37.497137, 59.562911)), .Names = c("Family", "Genus", "Species", "EH"), class = "data.frame", row.names = c(NA, -12L)) #My model test <- lm(EH~Family, data = example) #in this small example no families are significant but if one was - would that mean they have significantly more EH than the others? Thanks Chris From b.rowlingson at lancaster.ac.uk Wed Oct 5 20:08:19 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Wed, 5 Oct 2011 19:08:19 +0100 Subject: [R] SPlus to R In-Reply-To: <1317830085.62643.YahooMailNeo@web120615.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> <1317830085.62643.YahooMailNeo@web120615.mail.ne1.yahoo.com> Message-ID: On Wed, Oct 5, 2011 at 4:54 PM, Scott Raynaud wrote: > It seems I have things set up correctly.? I suspect that the arguments > sshc(100,10) are the isuue.? It seems that the 100,10 is not necessary since > the code itself specifies the arguments.? It runs and produces a power curve > if I simply type sshc() but it also seems to try to keep running somethng as > I have to click stop to get back to a prompt in the console. > > Why specify 100,10?? There are 9 arguments, 3 which are required and the > rest optional.? Shouldn't I have to specify the 3 required arguments, nc, d > and method at a minimum?? It would look like sshc(nc=500, d=.5, method=3), > right?? I;m still not sure, however, why that would be necessary since it's > hard coded. The sshc(10,100) was just some numbers I plucked out of nowhere. Your definition: sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, + tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) actually probably only needs the first value, the other parameters will take the defaults. sshc(10) should minimally run. [[pedantic note I say probably because R code can look like this: foo = function(x){ if(missing(x)){x = 99} ... } which is the same as foo = function(x=99){...} - so just because there's no default in the function definition it doesn't mean you have to supply it. end pedantic note]] Not sure why you have to click 'stop' - it might be that there's a couple of 'while' loops in there which might not be terminating. There's what looks like some debugging calls to 'cat' commented out - if you uncomment them you'll see what's going on, but you might not see them as they happen in Windows since I dont think the output isn't normally flushed immediately. There's probably an option you can set or a flush function you can call.... Barry From wdunlap at tibco.com Wed Oct 5 20:14:55 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 18:14:55 +0000 Subject: [R] SPlus to R In-Reply-To: References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> <1317830085.62643.YahooMailNeo@web120615.mail.ne1.yahoo.com> Message-ID: I took the original code, changed all return() calls of the form return(n1=v1,n2=v2) to return(list(n1=v1,n2=v2)) and then sshc(10,100) chugged away and produced some plots and returned something with no errors. It took a couple of minutes. I also changed T->TRUE and F->FALSE, as that makes the code a safer to use in R, where TRUE is a reserved word but T is not. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Barry Rowlingson > Sent: Wednesday, October 05, 2011 11:08 AM > To: Scott Raynaud > Cc: r-help at r-project.org > Subject: Re: [R] SPlus to R > > On Wed, Oct 5, 2011 at 4:54 PM, Scott Raynaud wrote: > > It seems I have things set up correctly.? I suspect that the arguments > > sshc(100,10) are the isuue.? It seems that the 100,10 is not necessary since > > the code itself specifies the arguments.? It runs and produces a power curve > > if I simply type sshc() but it also seems to try to keep running somethng as > > I have to click stop to get back to a prompt in the console. > > > > Why specify 100,10?? There are 9 arguments, 3 which are required and the > > rest optional.? Shouldn't I have to specify the 3 required arguments, nc, d > > and method at a minimum?? It would look like sshc(nc=500, d=.5, method=3), > > right?? I;m still not sure, however, why that would be necessary since it's > > hard coded. > > The sshc(10,100) was just some numbers I plucked out of nowhere. Your > definition: > > sshc<-function(rc, nc=500, d=.5, method=3, alpha=0.05, power=0.8, > + tol=0.01, tol1=.0001, tol2=.005, cc=c(.1,2), l.span=.5) > > actually probably only needs the first value, the other parameters > will take the defaults. sshc(10) should minimally run. > > [[pedantic note > I say probably because R code can look like this: > > foo = function(x){ > if(missing(x)){x = 99} > ... > } > > which is the same as foo = function(x=99){...} - so just because > there's no default in the function definition it doesn't mean you have > to supply it. > end pedantic note]] > > Not sure why you have to click 'stop' - it might be that there's a > couple of 'while' loops in there which might not be terminating. > There's what looks like some debugging calls to 'cat' commented out - > if you uncomment them you'll see what's going on, but you might not > see them as they happen in Windows since I dont think the output isn't > normally flushed immediately. There's probably an option you can set > or a flush function you can call.... > > Barry > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From wdunlap at tibco.com Wed Oct 5 20:24:03 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 18:24:03 +0000 Subject: [R] speed up this algorithm (apply-fuction / 4D array) In-Reply-To: <4C6DA2DA-1ABB-433C-BEFE-57F66689ED17@googlemail.com> References: <4C6DA2DA-1ABB-433C-BEFE-57F66689ED17@googlemail.com> Message-ID: I corrected your code a bit and put it into a function, f0, to make testing easier. I also made a small dataset to make testing easier. Then I made a new function f1 which does what f0 does in a vectorized manner: x <- array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91)) xsmall <- array(log(seq_len(2 * 2 * 2 * 91)), dim=c(2, 2, 2, 91)) f0 <- function(x) { data_reduced <- array(0, dim=c(dim(x)[1:3], trunc(dim(x)[4]/3))) reduce <- seq(1, dim(x)[4]-1, by=3) for( i in 1:length(reduce) ) { data_reduced[ , , , i] <- apply(x[ , , , reduce[i] : (reduce[i]+2) ], 1:3, mean) } data_reduced } f1 <- function(x) { reduce <- seq(1, dim(x)[4]-1, by=3) data_reduced <- (x[, , , reduce] + x[, , , reduce+1] + x[, , , reduce+2]) / 3 data_reduced } The results were: > system.time(v1 <- f1(x)) user system elapsed 0.280 0.040 0.323 > system.time(v0 <- f0(x)) user system elapsed 73.760 0.060 73.867 > all.equal(v0, v1) [1] TRUE >> "I thought apply would already vectorize, rather than loop over every coordinate." No, you have that backwards. Use *apply functions when you cannot figure out how to vectorize. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Batholdy > Sent: Wednesday, October 05, 2011 10:40 AM > To: R Help > Subject: [R] speed up this algorithm (apply-fuction / 4D array) > > Hi, > > > I have this sample-code (see above) and I was wondering wether it is possible to speed things up. > > > > What this code does is the following: > > x is 4D array (you can imagine it as x, y, z-coordinates and a time-coordinate). > > So x contains 50x50x50 data-arrays for 91 time-points. > > Now I want to reduce the 91 time-points. > I want to merge three consecutive time points to one time-points by calculating the mean of this three > time-points for every x,y,z coordinate. > > The reduce-sequence defines which time-points should get merged. > And the apply-function in the for-loop calculates the mean of the three 3D-Arrays and puts them into a > new 4D array (data_reduced). > > > > The problem is that even in this example it takes really long. > I thought apply would already vectorize, rather than loop over every coordinate. > > But for my actual data-set it takes a really long time ... So I would be really grateful for any > suggestions how to speed this up. > > > > > x <- array(rnorm(50 * 50 * 50 * 90, 0, 2), dim=c(50, 50, 50, 91)) > > > > data_reduced <- array(0, dim=c(50, 50, 50, 90/3)) > > reduce <- seq(1,90, 3) > > > > for( i in 1:length(reduce) ) { > > data_reduced[ , , , i] <- apply(x[ , , , reduce[i] : (reduce[i]+3) ], 1:3, mean) > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Wed Oct 5 15:32:16 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 5 Oct 2011 08:32:16 -0500 Subject: [R] mean of 3D arrays In-Reply-To: <2A0D0E4F-B6F0-4476-9CFC-7071D2E0AEAB@gmail.com> References: <5E6A2740-10A7-4CED-9B41-32A109317325@googlemail.com> <2A0D0E4F-B6F0-4476-9CFC-7071D2E0AEAB@gmail.com> Message-ID: On Oct 5, 2011, at 8:14 AM, R. Michael Weylandt wrote: > (x1+x2+x3)/3 > > I'm not aware of a "pmean" function but it wouldn't be hard to > homebrew one if you are comfortable with the ... argument > > I'll draft one up and send it along pmean <- function(lis) Reduce("+",lis)/length(lis) res <- pmean( list(x1,x2,x3) ) > str(res) num [1:10, 1:10, 1:10] -0.879 0.843 -2.184 1.33 -0.675 ... -- David. > > Michael Weylandt > > On Oct 5, 2011, at 9:00 AM, Martin Batholdy > wrote: > >> Hi, >> >> I have multiple three dimensional arrays. >> >> Like this: >> >> x1 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) >> x2 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) >> x3 <- array(rnorm(1000, 1, 2), dim=c(10, 10, 10)) >> >> >> Now I would like to compute the mean for each corresponding cell. >> As a result I want to get one 3D array (10 x 10 x 10) in which at >> position x, y, z is the mean of the corresponding values of x1, x2 >> and x3 (at position x, y, z). >> >> >> How can I do this? >> e. From tariqdaudi at gmail.com Wed Oct 5 15:35:29 2011 From: tariqdaudi at gmail.com (Tariq) Date: Wed, 5 Oct 2011 06:35:29 -0700 (PDT) Subject: [R] How to make an orderly matrix from geostatistical data? Message-ID: <1317821729773-3874499.post@n4.nabble.com> Hi everybody, I used the krige.conv command (geoR package) to create a new data set. The input was a matrix with three spatial coordinates (x, y, z) in the first three columns and the value of a variable in the last column. The output is... a weird sequence of numbers. How can I make this output into the same format as the input, i.e. a nice matrix with three columns of coordinates and a last column with said variable? Kind regards and thank you for helping! -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-an-orderly-matrix-from-geostatistical-data-tp3874499p3874499.html Sent from the R help mailing list archive at Nabble.com. From pat2 at hi.is Wed Oct 5 16:01:48 2011 From: pat2 at hi.is (Panagiotis) Date: Wed, 5 Oct 2011 07:01:48 -0700 (PDT) Subject: [R] best way to further analyse a mixed model? Message-ID: <1317823308786-3874592.post@n4.nabble.com> Hi, I want to ask which way is more effective to further analyse (multiple comparisons) a mixed model repeated measures anova with 2 fixed factor and 1 random? anova(lme(expr~treatment*age,random=~1|trial, data) Is searching for an effect of one factor in each of the subsamples defined by the second factor the best way anova(lme(expr~age,subset=treatment=="sal",random=~1|trial, data). or use multcomp? Thank you Panagiotis -- View this message in context: http://r.789695.n4.nabble.com/best-way-to-further-analyse-a-mixed-model-tp3874592p3874592.html Sent from the R help mailing list archive at Nabble.com. From dethlef1 at hotmail.com Wed Oct 5 16:06:30 2011 From: dethlef1 at hotmail.com (behave) Date: Wed, 5 Oct 2011 07:06:30 -0700 (PDT) Subject: [R] "stepwise" sum Message-ID: <1317823590115-3874606.post@n4.nabble.com> dear R-Community is there a function which sums data "stepwise" exp: 2 1 4 5 Desired result 2 = 2 2+1 = 3 2+1+4 = 7 2+1+4+5 = 12 Is there a built in function for this? Thx Dom -- View this message in context: http://r.789695.n4.nabble.com/stepwise-sum-tp3874606p3874606.html Sent from the R help mailing list archive at Nabble.com. From honeyoak at gmail.com Wed Oct 5 16:57:25 2011 From: honeyoak at gmail.com (honeyoak) Date: Wed, 5 Oct 2011 07:57:25 -0700 (PDT) Subject: [R] dynamically creating functions in r Message-ID: <1317826645792-3874767.post@n4.nabble.com> it is possible to dynamically create functions in R using lists? what I want to do is something like this: a = list() for (i in 1:10) a[[i]] = function(seed = i) runif(seed) so that when I call a[i] I get random draws 1,2,....i unfortunately R only uses the last i . I would also like to know if there is a run-all function without explicitly looping or using lapply. for example if I have a list 'b' of functions if I called run-all(b) all the functions in list 'b' would be run thanks. -- View this message in context: http://r.789695.n4.nabble.com/dynamically-creating-functions-in-r-tp3874767p3874767.html Sent from the R help mailing list archive at Nabble.com. From kanshined1 at gmail.com Wed Oct 5 17:05:21 2011 From: kanshined1 at gmail.com (Eugene Kanshin) Date: Wed, 5 Oct 2011 11:05:21 -0400 Subject: [R] white on black theme for ggplot2 Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From venerealdisease2011 at gmail.com Wed Oct 5 17:30:31 2011 From: venerealdisease2011 at gmail.com (venerealdisease) Date: Wed, 5 Oct 2011 08:30:31 -0700 (PDT) Subject: [R] about the array transpose In-Reply-To: References: <1317605351283-3866241.post@n4.nabble.com> <4E89ED43.50301@statistik.tu-dortmund.de> Message-ID: <1317828631836-3874870.post@n4.nabble.com> many thanks. I will try to figure it out. -- View this message in context: http://r.789695.n4.nabble.com/about-the-array-transpose-tp3866241p3874870.html Sent from the R help mailing list archive at Nabble.com. From nvanzuydam at gmail.com Wed Oct 5 17:53:20 2011 From: nvanzuydam at gmail.com (natalie.vanzuydam) Date: Wed, 5 Oct 2011 08:53:20 -0700 (PDT) Subject: [R] Subsetting a data frame with multiple values and exclusions. Message-ID: <1317830000927-3874967.post@n4.nabble.com> Hi all, I realise that the convention is to provide a working example of my problem but the data are of a sensitive nature so I'm not able to do that in this case. I need to query a database for multiple search terms: db <- structure(list(ind = c("ind1", "ind2", "ind3", "ind4"), test1 = c(1, 2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, 9, 1.2)), .Names = c("ind", "test1", "test2", "test3"), class = "data.frame", row.names = c(NA, -4L)) terms_include <- c("1","2","3") terms_exclude <- c("1.1","1.2","1.3") So I need to write a loop where the search of each value in the list of terms_include is searched over the entire data frame. I thought of using apply with grepl and subset? At the same time if the value of terms_include occurs in the same row as values from terms_exclude then that row must be excluded from the output dataframe. I'm not sure where to even begin. I've only worked very basically with subset. The final database is much larger and the number of search terms is many more than are presented here so I would really need to be able to loop over the data frame successively to return a final df with my searched values in at least one of the columns. Your help and assistance is much appreciated, Natalie ----- Natalie Van Zuydam PhD Student University of Dundee nvanzuydam at dundee.ac.uk -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-with-multiple-values-and-exclusions-tp3874967p3874967.html Sent from the R help mailing list archive at Nabble.com. From sina.rueeger at gmail.com Wed Oct 5 19:18:37 2011 From: sina.rueeger at gmail.com (sina rueeger) Date: Wed, 5 Oct 2011 10:18:37 -0700 (PDT) Subject: [R] reporting multiple objects out of a function In-Reply-To: <1317788874982-3873380.post@n4.nabble.com> References: <1317788874982-3873380.post@n4.nabble.com> Message-ID: <1317835117366-3875488.post@n4.nabble.com> Hi Andrew I am not sure if I understood your question entirely. You want to store some objects, but not in the global environment. Correct?! I would do it like this (although I am sure that there is a more elegant way to do this). ## ---------------------------------------------------------------- obj1 <- 2 attach(what = NULL, name = "my_env") ##?create new environment assign("obj1", obj1, envir = as.environment("my_env")) ## assign obj1 to new environment rm(list = ls()) ##?remove all objects from global environment obj1 ##?still available ls("my_env") ##?still available in environment "my_env" ## ---------------------------------------------------------------- Regards, Sina -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3875488.html Sent from the R help mailing list archive at Nabble.com. From Zuofeng.Shang.5 at nd.edu Wed Oct 5 19:44:18 2011 From: Zuofeng.Shang.5 at nd.edu (JeffND) Date: Wed, 5 Oct 2011 10:44:18 -0700 (PDT) Subject: [R] A question about R image function Message-ID: <1317836658333-3875581.post@n4.nabble.com> Dear folks, I have a question about the image() function in R. I found the following link talking about this but the replies didn't help with my situations. http://r.789695.n4.nabble.com/question-on-image-function-td839275.html#a839276 To be simple, I will keep using the example in the above link. Suppose the data are like x y mcpvalue 0.4603578 0.6247629 1.001 0.4603715 0.6247788 1.001 0.4603852 0.6247948 1.001 0.4110561 0.5664841 0.995 So we have four points with coordinates given by the four pairs of (x,y) values. Each point is associated with a mcpvalue. How do we use image() to plot "mcpvalue" as a two-dimensional plot? I hope that the points are positioned by the coordinates and the color of each point in that plot is changing with the value of mcpvalue. Using image() does not work as the coordinates of the points are not ascending. Thanks a lot! Jeff -- View this message in context: http://r.789695.n4.nabble.com/A-question-about-R-image-function-tp3875581p3875581.html Sent from the R help mailing list archive at Nabble.com. From rafalpedzimaz at gmail.com Wed Oct 5 19:56:09 2011 From: rafalpedzimaz at gmail.com (rafal) Date: Wed, 5 Oct 2011 10:56:09 -0700 (PDT) Subject: [R] Needed help with 3 factor anova !!! Message-ID: <1317837369210-3875620.post@n4.nabble.com> I am a student from Poland. What I am interested in is 3 factor anova with R. Could you please help me find an example with using this method with R? With all possible countable output for anova as the output presents with 3 factor anova with spss? I would be glad with any help. -- View this message in context: http://r.789695.n4.nabble.com/Needed-help-with-3-factor-anova-tp3875620p3875620.html Sent from the R help mailing list archive at Nabble.com. From ask4rauf at yahoo.com Wed Oct 5 20:48:05 2011 From: ask4rauf at yahoo.com (rauf ibrahim) Date: Wed, 5 Oct 2011 11:48:05 -0700 Subject: [R] variance ratio test Message-ID: <1317840485.62313.YahooMailNeo@web161519.mail.bf1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From donaldngwe at gmail.com Wed Oct 5 16:38:29 2011 From: donaldngwe at gmail.com (darkgaze) Date: Wed, 5 Oct 2011 07:38:29 -0700 (PDT) Subject: [R] Create combinations of rows In-Reply-To: <4E8BDEBF.2000507@yahoo.de> References: <1317766875225-3872641.post@n4.nabble.com> <4E8BDEBF.2000507@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From karel.viaene at ugent.be Wed Oct 5 16:21:04 2011 From: karel.viaene at ugent.be (Karel V) Date: Wed, 5 Oct 2011 07:21:04 -0700 (PDT) Subject: [R] gamm: problems with corCAR1() Message-ID: <1317824464784-3874669.post@n4.nabble.com> Dear all, I?m analyzing this dataset containing biodiversity indices, measured over time (Week), and at various contaminant concentrations (Treatment). We have two replicates (Replicate) per treatment. I?m looking for the effects of time (Week) and contaminant concentration (Treatment) on diversity indices (e.g. richness). Initial analysis with GAM models showed temporal autocorrelation of diversity. So now I?m trying to fit this gamm (gamm1): gamm1 <- gamm(richness~ s(Week,by=as.numeric(Treatment=="0"),k=6) + s(Week,by=as.numeric(Treatment=="0.5"),k=6) + s(Week,by=as.numeric(Treatment=="5"),k=6) + s(Week,by=as.numeric(Treatment=="15"),k=6) + s(Week,by=as.numeric(Treatment=="50"),k=6) + s(Week,by=as.numeric(Treatment=="150"),k=6) + s(Treatment,k=6,fx=FALSE) + factor(Treatment), correlation=corCAR1(form=~Week|factor(Treatment),data=indices,family=gaussian) I seem to be having difficulties with the correlation structure. An initial error occurred because replicates were taken at the same time: /Error in Initialize.corCAR1(X[[2L]], ...) : Covariate must have unique values within groups for corCAR1 objects / I solved this by selecting one replicate but is there another solution for this? Moreover, when analyzing the data of one replicate, I received following error: /Error in MEestimate(lmeSt, grps) : Singularity in backsolve at level 0, block 1 / I have no idea how to solve this. It seems to be related with the complexity of the model because no error occurred when running a simpler gamm (gamm2): gamm2 <- gamm(richness~ s(Week,k=6,fx=FALSE) + factor(Treatment), correlation=corCAR1(form=~Week|conc.f), data=test,family=gaussian) Any help would be well appreciated! With kind regards, Karel -- View this message in context: http://r.789695.n4.nabble.com/gamm-problems-with-corCAR1-tp3874669p3874669.html Sent from the R help mailing list archive at Nabble.com. From raji.sankaran at gmail.com Wed Oct 5 17:10:01 2011 From: raji.sankaran at gmail.com (Raji) Date: Wed, 5 Oct 2011 08:10:01 -0700 (PDT) Subject: [R] Reg : read missing values from database using RJDBC In-Reply-To: <1297941872292-3310591.post@n4.nabble.com> References: <1296728342074-3257766.post@n4.nabble.com> <1297941872292-3310591.post@n4.nabble.com> Message-ID: <1317827401264-3874809.post@n4.nabble.com> Hi All, This seems to be a bug with RJDBC package and it has been fixed in the latest RJDBC_0.1-6 version. I would like to try out RJDBC_0.1-6. can you please guide me to a link where i can find the 64-bit RJDBC_0.1-6.zip . I could find the 32-bit version at http://cran.sixsigmaonline.org/bin/windows/contrib/2.11/ ? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Reg-read-missing-values-from-database-using-RJDBC-tp3257766p3874809.html Sent from the R help mailing list archive at Nabble.com. From bhdavis1978 at gmail.com Wed Oct 5 21:00:38 2011 From: bhdavis1978 at gmail.com (Brad Davis) Date: Wed, 5 Oct 2011 12:00:38 -0700 Subject: [R] Difficulty with lme Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From justin.balko at Vanderbilt.Edu Wed Oct 5 21:01:05 2011 From: justin.balko at Vanderbilt.Edu (Balko, Justin) Date: Wed, 5 Oct 2011 14:01:05 -0500 Subject: [R] Odd gridding pattern when plotting In-Reply-To: <4E871973.2090605@statistik.tu-dortmund.de> References: <359118272EA85C409A0B9BEA0B2E11DE14F37894F3@its-hcwnem04.ds.Vanderbilt.edu> <359118272EA85C409A0B9BEA0B2E11DE14F37894FA@its-hcwnem04.ds.Vanderbilt.edu> <00a901cc7fa2$5a3983d0$0eac8b70$@edu> <359118272EA85C409A0B9BEA0B2E11DE14F3789501@its-hcwnem04.ds.Vanderbilt.edu> <4E871973.2090605@statistik.tu-dortmund.de> Message-ID: <359118272EA85C409A0B9BEA0B2E11DE14F3789513@its-hcwnem04.ds.Vanderbilt.edu> Thanks Uwe, The patched 2.13.2 solves this issue. Best, Justin M. Balko, Pharm.D., Ph.D. Research Fellow, Arteaga Lab Department of Medicine Division of Hematology/Oncology Vanderbilt University 777 Preston Research Building Nashville TN, 37232-6307 Ph: 615-936-1495 -----Original Message----- From: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] Sent: Saturday, October 01, 2011 8:45 AM To: Balko, Justin Cc: dcarlson at tamu.edu; r-help at r-project.org Subject: Re: [R] Odd gridding pattern when plotting I think you found a bug introduced in R-2.13.x that has been fixed in R-2.13.2 which has been released yesterday. Best, Uwe Ligges On 30.09.2011 21:36, Balko, Justin wrote: > Thanks, that kind of helps. However, some of my previous code uses functions like heatmap.2 which has multiple images (legend/color key) as well as the actual heatmap. Employing useRaster=TRUE here only applies to the heatmap and not the legend. Not a huge deal. Is there anyway to set an option in R to always use rastering when drawing in the interface? > Thanks again, > Justin > > -----Original Message----- > From: David L Carlson [mailto:dcarlson at tamu.edu] > Sent: Friday, September 30, 2011 1:54 PM > To: Balko, Justin; r-help at r-project.org > Subject: RE: [R] Odd gridding pattern when plotting > >> From ?image > > " Images for large z on a regular grid are more efficient with useRaster enabled and can prevent rare anti-aliasing artifacts, but may not be supported by all graphics devices." > > Adding useRaster=TRUE to the two image() calls gets rid of the white grid lines. > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Balko, Justin > Sent: Friday, September 30, 2011 10:43 AM > To: r-help at r-project.org > Subject: [R] Odd gridding pattern when plotting > > > Hi, I'm no longer on the subscribing list, but was hoping to get my question posted. Please inform if this is ok, although I am guessing you wont post with the image below. If so, let me know and I will resend without the image. > Thanks > > > Hi, > I just upgraded my system and my version of R all at once. Upon running old code for heatmaps etc, I suddenly notice that there is an odd grid pattern appearing in all of my plots. An example is below: > > #example from ?image > > require(grDevices) # for colours > x<- y<- seq(-4*pi, 4*pi, len=27) > r<- sqrt(outer(x^2, y^2, "+")) > image(z = z<- cos(r^2)*exp(-r/6), col=gray((0:32)/32)) image(z, axes = FALSE, main = "Math can be beautiful ...", > xlab = expression(cos(r^2) * e^{-r/6})) contour(z, add = TRUE, drawlabels = FALSE) > > > > Any ideas what is causing this? I can't seem to figure it out. I'm not sure the bmp image can/will be posted, so maybe you can just take my word for it. It is a gridding pattern in white, that appears over the plot area only. Vertical lines are every 4 units, evenly spaced. Horizontal lines appear at every unit, then stop for a while (6-7 units, then appear every unit for 4-5 units). Simple plots like plot(x,y) do not seem to produce it, or at least I can't see it. Any ideas are helpful. > Thanks! > > > Justin M. Balko, Pharm.D., Ph.D. > Research Fellow, Arteaga Lab > Department of Medicine > Division of Hematology/Oncology > Vanderbilt University > 777 Preston Research Building > Nashville TN, 37232-6307 > Ph: 615-936-1495 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From upananda.pani at gmail.com Wed Oct 5 19:09:24 2011 From: upananda.pani at gmail.com (upananda pani) Date: Wed, 5 Oct 2011 22:39:24 +0530 Subject: [R] creating a loop for a function Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From yiy83102 at nate.com Wed Oct 5 20:38:22 2011 From: yiy83102 at nate.com (=?UTF-8?B?eWl5ODMxMDI=?=) Date: Thu, 06 Oct 2011 03:38:22 +0900 Subject: [R] =?utf-8?q?Usng_MCMCpack=2Cerror_is_=22initial_value_in_vmmin_?= =?utf-8?q?is_not_finite=22?= Message-ID: From zoubidoo at hotmail.com Wed Oct 5 15:08:44 2011 From: zoubidoo at hotmail.com (Parker Jones) Date: Thu, 6 Oct 2011 02:08:44 +1300 Subject: [R] Display a contingency table on the X11 device Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From scott.raynaud at yahoo.com Wed Oct 5 17:54:45 2011 From: scott.raynaud at yahoo.com (Scott Raynaud) Date: Wed, 5 Oct 2011 08:54:45 -0700 (PDT) Subject: [R] SPlus to R In-Reply-To: References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> Message-ID: <1317830085.62643.YahooMailNeo@web120615.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From scott.raynaud at yahoo.com Wed Oct 5 15:48:49 2011 From: scott.raynaud at yahoo.com (Scott Raynaud) Date: Wed, 5 Oct 2011 06:48:49 -0700 (PDT) Subject: [R] SPlus to R In-Reply-To: <4E8C558E.8070707@statistik.tu-dortmund.de> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> <4E8C558E.8070707@statistik.tu-dortmund.de> Message-ID: <1317822529.69203.YahooMailNeo@web120614.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarah.goslee at gmail.com Wed Oct 5 21:14:54 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 5 Oct 2011 15:14:54 -0400 Subject: [R] dynamically creating functions in r In-Reply-To: <1317826645792-3874767.post@n4.nabble.com> References: <1317826645792-3874767.post@n4.nabble.com> Message-ID: Hi, On Wed, Oct 5, 2011 at 10:57 AM, honeyoak wrote: > it is possible to dynamically create functions in R using lists? what I want > to do is something like this: > > ? ? ?a = list() > ? ? ?for (i in 1:10) a[[i]] = function(seed = i) runif(seed) > > so that when I call a[i] I get random draws 1,2,....i unfortunately R only > uses the last i . I'm not sure I understand what you want. Do you want to set a new seed for the random number generator, or do you want a random vector of length i each time? If the former, I'm not sure why you'd want to do that, but your choice of variable names makes me wonder. If the latter, you just need a bit of clean-up. a <- list() for (i in 1:10) { a[[i]] <- runif(i) } But that's not dynamically creating a function, so maybe I'm missing the point. > I would also like to know if there is a run-all function > without explicitly looping or using lapply. for example if I have a list 'b' > of functions if I called > > ? ? ?run-all(b) > > all the functions in list 'b' would be run > > thanks. What's wrong with lapply? I think we need to know more about what you're trying to do. You might also want to look at do.call(). Sarah -- Sarah Goslee http://www.functionaldiversity.org From sarah.goslee at gmail.com Wed Oct 5 21:15:42 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 5 Oct 2011 15:15:42 -0400 Subject: [R] "stepwise" sum In-Reply-To: <1317823590115-3874606.post@n4.nabble.com> References: <1317823590115-3874606.post@n4.nabble.com> Message-ID: > cumsum(c(2,1,4,5)) [1] 2 3 7 12 On Wed, Oct 5, 2011 at 10:06 AM, behave wrote: > dear R-Community > > is there a function which sums data "stepwise" > > exp: > > 2 > 1 > 4 > 5 > > Desired ?result > > 2 = 2 > 2+1 = 3 > 2+1+4 = 7 > 2+1+4+5 = 12 > > Is there a built in function for this? > > Thx > Dom > -- Sarah Goslee http://www.functionaldiversity.org From NordlDJ at dshs.wa.gov Wed Oct 5 21:15:12 2011 From: NordlDJ at dshs.wa.gov (Nordlund, Dan (DSHS/RDA)) Date: Wed, 5 Oct 2011 12:15:12 -0700 Subject: [R] "stepwise" sum In-Reply-To: <1317823590115-3874606.post@n4.nabble.com> References: <1317823590115-3874606.post@n4.nabble.com> Message-ID: <941871A13165C2418EC144ACB212BDB002123D19@dshsmxoly1504g.dshs.wa.lcl> > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of behave > Sent: Wednesday, October 05, 2011 7:07 AM > To: r-help at r-project.org > Subject: [R] "stepwise" sum > > dear R-Community > > is there a function which sums data "stepwise" > > exp: > > 2 > 1 > 4 > 5 > > Desired result > > 2 = 2 > 2+1 = 3 > 2+1+4 = 7 > 2+1+4+5 = 12 > > Is there a built in function for this? > > Thx > Dom > > ?cumsum Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 From murdoch.duncan at gmail.com Wed Oct 5 21:18:14 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 05 Oct 2011 15:18:14 -0400 Subject: [R] dynamically creating functions in r In-Reply-To: <1317826645792-3874767.post@n4.nabble.com> References: <1317826645792-3874767.post@n4.nabble.com> Message-ID: <4E8CAD76.6080708@gmail.com> On 05/10/2011 10:57 AM, honeyoak wrote: > it is possible to dynamically create functions in R using lists? what I want > to do is something like this: > > a = list() > for (i in 1:10) a[[i]] = function(seed = i) runif(seed) > > so that when I call a[i] I get random draws 1,2,....i unfortunately R only > uses the last i . That is because you never evaluate it until you call the function. You can do what you want in several ways; one is for (i in 1:10) a[[i]] <- local( { default <- i; function(seed = default) runif(seed) } ) Duncan Murdoch > I would also like to know if there is a run-all function > without explicitly looping or using lapply. for example if I have a list 'b' > of functions if I called > > run-all(b) > > all the functions in list 'b' would be run > > thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/dynamically-creating-functions-in-r-tp3874767p3874767.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From sarah.goslee at gmail.com Wed Oct 5 21:18:23 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 5 Oct 2011 15:18:23 -0400 Subject: [R] variance ratio test In-Reply-To: <1317840485.62313.YahooMailNeo@web161519.mail.bf1.yahoo.com> References: <1317840485.62313.YahooMailNeo@web161519.mail.bf1.yahoo.com> Message-ID: Hi, Searching on http://www.rseek.org for "variance ratio test" turns up the vrtest package, as does searching for Lo and Mackinlay, suggesting that's a good place to start. Sarah On Wed, Oct 5, 2011 at 2:48 PM, rauf ibrahim wrote: > Hello, > I am looking for a code in R for the variance ratio test statistic (the > Lo and Mackinlay version or any other versions). > Does anybody have such a code they can share or know a library in which > I can find this function? > Basically I have a number of time series which I need to check for > persistence. One other test I can use is the runs test in the tseries > package. > Any help will be greatly appreciated. > Thanks a lot, > Rauf Ibrahim R. > --- Sarah Goslee http://www.functionaldiversity.org From gunter.berton at gene.com Wed Oct 5 21:25:26 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 5 Oct 2011 12:25:26 -0700 Subject: [R] SPlus to R In-Reply-To: <1317830085.62643.YahooMailNeo@web120615.mail.ne1.yahoo.com> References: <1317779599.97667.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1317815060.69968.YahooMailNeo@web120614.mail.ne1.yahoo.com> <1317830085.62643.YahooMailNeo@web120615.mail.ne1.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarah.goslee at gmail.com Wed Oct 5 21:26:48 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 5 Oct 2011 15:26:48 -0400 Subject: [R] A question about R image function In-Reply-To: <1317836658333-3875581.post@n4.nabble.com> References: <1317836658333-3875581.post@n4.nabble.com> Message-ID: Hi, On Wed, Oct 5, 2011 at 1:44 PM, JeffND wrote: > Dear folks, > > I have a question about the image() function in R. I found the following > link talking about this > but the replies didn't help with my situations. > > http://r.789695.n4.nabble.com/question-on-image-function-td839275.html#a839276 In fact, that link states exactly what needs to be done, and why. > To be simple, I will keep using the example in the above link. > > Suppose the data are like > > ? ? ? ? x ? ? ? ? ? ? ? ?y ? ? ? ? ? ? ?mcpvalue > 0.4603578 ? ? 0.6247629 ? ? ? 1.001 > 0.4603715 ? ? 0.6247788 ? ? ? ?1.001 > 0.4603852 ? ? 0.6247948 ? ? ? 1.001 > 0.4110561 ? ? 0.5664841 ? ? ? 0.995 > > So we have four points with coordinates given by the four pairs of (x,y) > values. > Each point is associated with a mcpvalue. > How do we use image() to plot "mcpvalue" as a two-dimensional plot? > I hope that the points are positioned by the coordinates and the color of > each point > in that plot is changing with the value of mcpvalue. We don't. image() is intended to plot data on a regular grid, and that's not a regular grid. You need to use kriging or some other form of spatial interpolation to fit it to a regular grid before you can use image(). You can however use plot() with the col= argument to plot points at the actual values of the coordinates, and to color them as desired. testdata <- structure(list(x = c(0.4603578, 0.4603715, 0.4603852, 0.4110561 ), y = c(0.6247629, 0.6247788, 0.6247948, 0.5664841), mcpvalue = c(1.001, 1.001, 1.001, 0.995)), .Names = c("x", "y", "mcpvalue"), class = "data.frame", row.names = c(NA, -4L)) testdata.colors <- cut(testdata$mcpvalue, c(0, 1, 2)) testdata.colors <- rainbow(length(testdata.colors))[testdata.colors] with(testdata, plot(x, y, col=testdata.colors)) Which is kind of useless in this case, but may work with your actual data, which is why I wrote it out so elaborately. > Using image() does not work as the coordinates of the points > are not ascending. Exactly. So you can't use image(). -- Sarah Goslee http://www.functionaldiversity.org From littledude.jarvis at gmail.com Wed Oct 5 21:32:30 2011 From: littledude.jarvis at gmail.com (justin jarvis) Date: Wed, 5 Oct 2011 12:32:30 -0700 Subject: [R] calling a variable which in turn calls many more variables Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Wed Oct 5 21:34:16 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 15:34:16 -0400 Subject: [R] creating a loop for a function In-Reply-To: References: Message-ID: lapply(1:10, function(i) Box.test (lfut, lag = i, type="Ljung")) Add extractors to get statistics as desired. Michael Weylandt On Wed, Oct 5, 2011 at 1:09 PM, upananda pani wrote: > Dear All, > > I want to create a loop within a function r. The example follows: > > Box.test (lfut, lag = 1, type="Ljung") > > if i want to compute the Box.test for lag 1 to 10, I have to write manually > ?change each time for different lag. So i wan to write a loop for the lag 1 > to 10 and return the statistics for each lag. Is there any method to do this > ? > > With regards, > Upananda > > -- > > > You may delay, but time will not. > > > Research Scholar > alternative mail id: upani at iitkgp.ac.in > Department of HSS, IIT KGP > KGP > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From izahn at psych.rochester.edu Wed Oct 5 21:37:45 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Wed, 5 Oct 2011 15:37:45 -0400 Subject: [R] calling a variable which in turn calls many more variables In-Reply-To: References: Message-ID: Hi Justin, On Wed, Oct 5, 2011 at 3:32 PM, justin jarvis wrote: > Hi all, > I am running regressions with many covariates, most of which remain the same > each time (control variables). ?Instead of writing 30 demographic variables > every regression, is there a way I could call them all at once using a > variable called, perhaps "demog"? I would create a base model with just the covariates, and then use update() to add other variables. Best, Ista > > I have tried: >> demog <- list(age1, age2, age3) but I get an error when I try to call a > list in a regression. > > I also tried: >> demog <- cbind(age1, age2, age3) which allows me to run a regression, but > this is not practical because when I subset the original data set and run a > regression, this new matrix demog doesn't get subsetted as well, so the > variables are of differing length. > > I'm thinking there is an easy way to do this. ?Thanks for any help > guys/gals. > > Justin > PhD student, > University of California, Irvine > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From emorway at usgs.gov Wed Oct 5 21:40:03 2011 From: emorway at usgs.gov (emorway) Date: Wed, 5 Oct 2011 12:40:03 -0700 (PDT) Subject: [R] subplot strange behavoir Message-ID: <1317843603409-3875917.post@n4.nabble.com> Hello, Below is some example code that should reproduce an error I'm encountering while trying to create a tiff plot with two subplots. If I run just the following bit of code through the R GUI the result is what I'd like to have appear in the saved tiff image: x<-seq(0:20) y<-c(1,1,2,2,3,4,5,4,3,6,7,1,1,2,2,3,4,5,4,3,6) plot(x,y,type="l",las=1,ylim=c(0,12)) subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c(1,1.5)) However, if expanding on this code with: edm.sub<-function(x,y){plot(x,y,col="red",frame.plot=F, las=1,xaxs="i",yaxs="i",type="b", ylim=c(0,6),xlab="",ylab="")} png("c:/temp/lookat.tif",res=120,height=600,width=1200) layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") plot(x,y,type="l",las=1,ylim=c(0,12)) subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c(1,1.5)) dev.off() One will notice the second subplot is out of position (notice the y-coordinate is the same for both subplots...y=9): http://r.789695.n4.nabble.com/file/n3875917/lookat.png If I try to 'guess' a new y-coordinate for the second subplot, say y=10: png("c:/temp/lookat.tif",res=120,height=600,width=1200) layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") plot(x,y,type="l",las=1,ylim=c(0,12)) subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=10,size=c(1,1.5)) dev.off() R kicks back the following message Error in plot.new() : plot region too large Am I mis-using subplot? Thanks, Eric -- View this message in context: http://r.789695.n4.nabble.com/subplot-strange-behavoir-tp3875917p3875917.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Wed Oct 5 21:49:52 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 12:49:52 -0700 Subject: [R] Subsetting a data frame with multiple values and exclusions. In-Reply-To: <1317830000927-3874967.post@n4.nabble.com> References: <1317830000927-3874967.post@n4.nabble.com> Message-ID: Hi: Is this what you're after? f <- function(x) !any(x %in% terms_exclude) && any(x %in% terms_include) db[apply(db[, -1], 1, f), ] ind test1 test2 test3 2 ind2 2 27 28.0 4 ind4 3 2 1.2 HTH, Dennis On Wed, Oct 5, 2011 at 8:53 AM, natalie.vanzuydam wrote: > Hi all, > > I realise that the convention is to provide a working example of my problem > but the data are ?of a sensitive nature so I'm not able to do that in this > case. > > I need to query a database for multiple search terms: > > db <- structure(list(ind = c("ind1", "ind2", "ind3", "ind4"), test1 = c(1, > 2, 1.3, 3), test2 = c(56L, 27L, 58L, 2L), test3 = c(1.1, 28, > 9, 1.2)), .Names = c("ind", "test1", "test2", "test3"), class = > "data.frame", row.names = c(NA, > -4L)) > > terms_include <- c("1","2","3") > terms_exclude <- c("1.1","1.2","1.3") > > So I need to write a loop where the search of each value in the list of > terms_include is searched over the entire data frame. ?I thought of using > apply with grepl and subset? ?At the same time if the value of terms_include > occurs in the same row as values from terms_exclude then that row must be > excluded from the output dataframe. > > I'm not sure where to even begin. ?I've only worked very basically with > subset. ?The final database is much larger and the number of search terms is > many more than are presented here so I would really need to be able to loop > over the data frame successively to return a final df with my searched > values in at least one of the columns. > > Your help and assistance is much appreciated, > Natalie > > > > ----- > Natalie Van Zuydam > > PhD Student > University of Dundee > nvanzuydam at dundee.ac.uk > -- > View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-with-multiple-values-and-exclusions-tp3874967p3874967.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From wdunlap at tibco.com Wed Oct 5 21:55:21 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 19:55:21 +0000 Subject: [R] dynamically creating functions in r In-Reply-To: <1317826645792-3874767.post@n4.nabble.com> References: <1317826645792-3874767.post@n4.nabble.com> Message-ID: Creating expressions and functions dynamically can be tricky. Usually I use functions like call(), substitute(), and formals(); very occasionally I use parse(text=). Here is one way to make a family of functions that differ only in the default value their their argument: > funsA <- lapply(1:3, function(i){ retval <- function(arg=i)arg^2 formals(retval)$arg <- i retval }) > sapply(funsA, function(f)f()) [1] 1 4 9 > funsA[[2]] function (arg = 2L) arg^2 Here is a way to make the functions differ in their bodies: > funsB <- lapply(c("sin", "cos", "sqrt"), function(fname) eval(substitute(function(x)f(x)^2, list(f=as.name(fname))))) > sapply(funsB, function(f)f(pi/3)) [1] 0.750000 0.250000 1.047198 > funsB[[2]] function (x) cos(x)^2 You can also add things to environment(yourFunction), where you arrange that each function has its own personal environment, instead of altering the function itself. This works, but can look a bit mysterious to the na?ve user who doesn't know to look in the environment of the function: > funsC <- lapply(1:3, function(i){ retval <- function(arg=i)arg^2 with(environment(retval), i <- i) retval }) > sapply(funsC, function(f)f()) [1] 1 4 9 > funsC[[2]] function (arg = i) arg^2 > as.list(environment(funsC[[2]])) $retval function (arg = i) arg^2 $i [1] 2 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of honeyoak > Sent: Wednesday, October 05, 2011 7:57 AM > To: r-help at r-project.org > Subject: [R] dynamically creating functions in r > > it is possible to dynamically create functions in R using lists? what I want > to do is something like this: > > a = list() > for (i in 1:10) a[[i]] = function(seed = i) runif(seed) > > so that when I call a[i] I get random draws 1,2,....i unfortunately R only > uses the last i . I would also like to know if there is a run-all function > without explicitly looping or using lapply. for example if I have a list 'b' > of functions if I called > > run-all(b) > > all the functions in list 'b' would be run > > thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/dynamically-creating-functions-in-r- > tp3874767p3874767.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From gunter.berton at gene.com Wed Oct 5 21:55:52 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 5 Oct 2011 12:55:52 -0700 Subject: [R] calling a variable which in turn calls many more variables In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Wed Oct 5 21:57:19 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 15:57:19 -0400 Subject: [R] cuhre usage ?? multidimensional integration In-Reply-To: <1317792382956-3873478.post@n4.nabble.com> References: <1317792382956-3873478.post@n4.nabble.com> Message-ID: Perhaps you should start by writing vectorized code; as it stands, your code suggests you don't understand what simple operations like y <- x actually do. More to your question: what are cuhre & crff ? They are not in base R nor in any packages I have current loaded. Michael On Wed, Oct 5, 2011 at 1:26 AM, sevenfrost wrote: > my=function(x){ > len=1 > for(i in 1:len){ > y[i]=x[i] > } > g=1 > w=NULL > t=NULL > for(i in 1:len)w[i]=x[i+len] > for(i in 1:len)t[i]=x[i+2*len] > for(i in 1:len)g=g*dnorm(y[i])*dnorm(w[i])*dnorm(z[i]) > return(g) > } > cuhre(6,1,my,rep(-100,6),rep(100,6)) > > Error in crff(match.call(), integrand, "cuhre", libargs, ...) : > ?Additional argument ?not expected in the integrand function > > function change to my=function(x,g,i,j) > result is not right. it should be 1, but it turns out to be 0.039... > > How can I make this work? > > Thank you! > > -- > View this message in context: http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3873478.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From sarah.goslee at gmail.com Wed Oct 5 22:06:32 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 5 Oct 2011 16:06:32 -0400 Subject: [R] subplot strange behavoir In-Reply-To: <1317843603409-3875917.post@n4.nabble.com> References: <1317843603409-3875917.post@n4.nabble.com> Message-ID: Hi, I'm assuming you're using subplot() from Hmisc, but it's a good idea to specify. It's not subplot() that's causing the problem, it's layout, or rather the interaction between the two. This section run at the command line doesn't work: layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") plot(x,y,type="l",las=1,ylim=c(0,12)) subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c(1,1.5)) You appear to have run afoul of two things: from ?subplot Any graphical parameter settings that you would like to be in place before ?fun? is evaluated can be specified in the ?pars? argument (warning: specifying layout parameters here (?plt?, ?mfrow?, etc.) may cause unexpected results). and from ?layout These functions are totally incompatible with the other mechanisms for arranging plots on a device: ?par(mfrow)?, ?par(mfcol)? and ?split.screen?. And also apparently subplot(). You could try asking the package maintainer, but I think you may be better off making two separate figures. Or you could delve into the mysteries of par(), of course. Incidentally, you can't use png() to make a tif, no matter what you call it. You probably want tiff() instead. Sarah On Wed, Oct 5, 2011 at 3:40 PM, emorway wrote: > Hello, > > Below is some example code that should reproduce an error I'm encountering > while trying to create a tiff plot with two subplots. ?If I run just the > following bit of code through the R GUI the result is what I'd like to have > appear in the saved tiff image: > > x<-seq(0:20) > y<-c(1,1,2,2,3,4,5,4,3,6,7,1,1,2,2,3,4,5,4,3,6) > plot(x,y,type="l",las=1,ylim=c(0,12)) > subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) > subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c(1,1.5)) > > However, if expanding on this code with: > > edm.sub<-function(x,y){plot(x,y,col="red",frame.plot=F, > ? ? ? ? ? ? ? ? ? ? ? las=1,xaxs="i",yaxs="i",type="b", > ? ? ? ? ? ? ? ? ? ? ? ylim=c(0,6),xlab="",ylab="")} > > png("c:/temp/lookat.tif",res=120,height=600,width=1200) > layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) > plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") > plot(x,y,type="l",las=1,ylim=c(0,12)) > subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) > subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c(1,1.5)) > dev.off() > > One will notice the second subplot is out of position (notice the > y-coordinate is the same for both subplots...y=9): > http://r.789695.n4.nabble.com/file/n3875917/lookat.png > > If I try to 'guess' a new y-coordinate for the second subplot, say y=10: > > png("c:/temp/lookat.tif",res=120,height=600,width=1200) > layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) > plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") > plot(x,y,type="l",las=1,ylim=c(0,12)) > subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) > subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=10,size=c(1,1.5)) > dev.off() > > R kicks back the following message > Error in plot.new() : plot region too large > > Am I mis-using subplot? > > Thanks, Eric > -- Sarah Goslee http://www.functionaldiversity.org From joseph.g.boyer at gsk.com Wed Oct 5 22:11:30 2011 From: joseph.g.boyer at gsk.com (Joseph Boyer) Date: Wed, 5 Oct 2011 20:11:30 +0000 Subject: [R] Variability plot in R In-Reply-To: References: <25F41B8FF173D643B590A4D77AF71FC50516C178@019D-NAMSG-06.019D.MGD.MSFT.NET> Message-ID: Dennis, Thank you for your reply. This is a good start for what I want to achieve. -- Joe -----Original Message----- From: Dennis Murphy [mailto:djmuser at gmail.com] Sent: Friday, May 20, 2011 10:55 PM To: Joseph Boyer Cc: r-help at r-project.org Subject: Re: [R] Variability plot in R Here's one attempt; I only used five of the wafers since you didn't provide any data. dd <- data.frame(wafer = factor(rep(1:5, each = 6)), operator = factor(rep(rep(1:3, each = 2), 5)), thickness = c(0.62, 0.66, 0.53, 0.53, 0.51, 0.55, 0.99, 1.00, 1.05, 0.93, 1.05, 1.02, 0.82, 0.81, 0.80, 0.77, 0.90, 0.77, 0.85, 0.89, 0.83, 0.76, 0.79, 0.81, 0.59, 0.48, 0.39, 0.40, 0.46, 0.51)) # Summarize the data to output the mean, sd, min and max of thickness library(ggplot2) dsumm <- ddply(dd, .(wafer, operator), summarise, tmean = mean(thickness), tmin = min(thickness), tmax = max(thickness), tsd = sd(thickness)) # 'Multi-vari' plot: p1 <- ggplot(dd) + geom_point(aes(x = wafer, y = thickness)) + geom_errorbar(data = dsumm, aes(x = wafer, y = tmean, ymin = tmin, ymax = tmax), colour = 'blue') + geom_segment(data = dsumm, aes(x = wafer, y = tmean, yend = tmean, xend = as.numeric(wafer) + 0.2), colour = 'blue') + geom_segment(data = dsumm, aes(x = wafer, y = tmean, yend = tmean, xend = as.numeric(wafer) - 0.2), colour = 'blue') + facet_wrap( ~ operator, nrow = 1) + xlab("") # Standard deviation plot p2 <- ggplot(dsumm, aes(x = wafer, y = tsd)) + geom_point(colour = 'blue') + geom_line(aes(group = 1), size = 1, colour = 'blue') + facet_wrap( ~ operator, nrow = 1) # Use the gridExtra package to combine the two graphs library(gridExtra) grid.arrange(p1, p2) HTH, Dennis On Fri, May 20, 2011 at 4:12 PM, Joseph Boyer wrote: > Is there a package in R that can do a variability plot? > > A variability plot is a kind of categorized dot plot. (If there is a lot of data in each category, box plots are used rather than dot plots.) > Usually, the categories are factor level combinations. All the dot plots appear in the same window; below the x-axis a hierarchy of factors > shows which dot plot corresponds to which factor-level combination. > > Examples can be seen > http://statsoft.com/support/blog/entryid/64/user-defined-variability-plots/ > and > http://www.public.iastate.edu/~wrstephe/stat495/GaugeRR_WaferThickness_JMPOutput.pdf > > By reordering the factor names in the function call, the user can reorder the factor level combinations on the graph, making it easier > to do the visual comparisons of interest. The user should also have the option to draw line segments at factor ?level combination means/medians, and to connect the category means/medians to make visual comparison easier. > > The only softwares which I am aware of which produce such a plot are Statistica and JMP. I have found these plots to be more powerful than > lattice-style categorizations in their ability to allow the user to conveniently process experimental data visually. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Wed Oct 5 22:13:55 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 13:13:55 -0700 Subject: [R] Needed help with 3 factor anova !!! In-Reply-To: <1317837369210-3875620.post@n4.nabble.com> References: <1317837369210-3875620.post@n4.nabble.com> Message-ID: Try Googling 'Three factor ANOVA R'; it didn't take long to find a few relevant hits. Dennis On Wed, Oct 5, 2011 at 10:56 AM, rafal wrote: > I am a student from Poland. What I am interested in is 3 factor anova with R. > Could you please help me find an example with using this method with R? > With all possible countable output for anova as the output presents with 3 > factor anova with spss? > I would be glad with any help. > > -- > View this message in context: http://r.789695.n4.nabble.com/Needed-help-with-3-factor-anova-tp3875620p3875620.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From batholdy at googlemail.com Wed Oct 5 22:14:11 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Wed, 5 Oct 2011 22:14:11 +0200 Subject: [R] do calculations as defined by a string / expand mathematical statements in R Message-ID: Dear R-group, is there a way to perform calculations that are defined in a string format? for example I have different variables: x1 <- 3 x2 <- 1 x4 <- 1 and a string-variable: do <- 'x1 + x2 + x3' Is there any way to perform what the variable 'do'-describes (just like the formula-element but more elemental)? Perhaps my idea to solve my problem is a little bit strange. My general problem is, that I have to do arithmetics for which there seems to be no function available that I can apply in order to be more flexible. To be precise, I have to add up three dimensional arrays. I can do that like this (as someone suggested on this help-list ? thanks for that!): (array[,,1] + array[,,2] + array[,,3]) / 3 However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' (or 10 or x). And I don't want to manually expand the above statement to: (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + array[,,8]) / 8 (ok, now I have done it ;) So, my thinking was that I can easily expand and change a string (with the paste-function / repeat-function etc.). But how can I expand a mathematical statement? thanks for any suggestions! From baptiste.auguie at googlemail.com Wed Oct 5 22:20:21 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Thu, 6 Oct 2011 09:20:21 +1300 Subject: [R] white on black theme for ggplot2 In-Reply-To: References: Message-ID: Hi, there are a couple of themes proposed in the wiki, one being white on black, https://github.com/hadley/ggplot2/wiki/Themes HTH, baptiste On 6 October 2011 04:05, Eugene Kanshin wrote: > Hello, > I'm trying to produce some plots in ggplot2 to use them on > the dark-blue gradient background. I am wondering if there is already any > theme/set of options that I can use to change the color scheme and > add transparency. > Thank you very much, > Evgeny. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From wdunlap at tibco.com Wed Oct 5 22:21:17 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 5 Oct 2011 20:21:17 +0000 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: References: Message-ID: Avoid parsing strings to make expressions. It is easy to do, but hard to do safely and readably. In your case you could make a short loop out of it result <- x[,,,1] for(i in seq_len(dim(x)[4])[-1]) { result <- result + x[,,,i] } result <- result / dim(x)[4] Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Batholdy > Sent: Wednesday, October 05, 2011 1:14 PM > To: R Help > Subject: [R] do calculations as defined by a string / expand mathematical statements in R > > Dear R-group, > > > is there a way to perform calculations that are defined in a string format? > > > for example I have different variables: > > x1 <- 3 > x2 <- 1 > x4 <- 1 > > and a string-variable: > > do <- 'x1 + x2 + x3' > > > Is there any way to perform what the variable 'do'-describes > (just like the formula-element but more elemental)? > > > > Perhaps my idea to solve my problem is a little bit strange. > > > My general problem is, that I have to do arithmetics for which there seems to be no function available > that I can apply in order to be more flexible. > > > To be precise, I have to add up three dimensional arrays. > > I can do that like this (as someone suggested on this help-list - thanks for that!): > > (array[,,1] + array[,,2] + array[,,3]) / 3 > > > However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' > (or 10 or x). > > And I don't want to manually expand the above statement to: > > (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + > array[,,8]) / 8 > > (ok, now I have done it ;) > > > > So, my thinking was that I can easily expand and change a string (with the paste-function / repeat- > function etc.). > But how can I expand a mathematical statement? > > > thanks for any suggestions! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Wed Oct 5 22:22:29 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 16:22:29 -0400 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: References: Message-ID: Didn't three of us give you a function (in various flavors) that would do the mean for variable inputs, reading them from a list? (Though David's was admittedly much cooler than mine!) Anyways, look into parse(text=do) with eval() if you want to go the string route. Michael On Wed, Oct 5, 2011 at 4:14 PM, Martin Batholdy wrote: > Dear R-group, > > > is there a way to perform calculations that are defined in a string format? > > > for example I have different variables: > > x1 <- 3 > x2 <- 1 > x4 <- 1 > > and a string-variable: > > do <- 'x1 + x2 + x3' > > > Is there any way to perform what the variable 'do'-describes > (just like the formula-element but more elemental)? > > > > Perhaps my idea to solve my problem is a little bit strange. > > > My general problem is, that I have to do arithmetics for which there seems to be no function available that I can apply in order to be more flexible. > > > To be precise, I have to add up three dimensional arrays. > > I can do that like this (as someone suggested on this help-list ? thanks for that!): > > (array[,,1] + array[,,2] + array[,,3]) / 3 > > > However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' > (or 10 or x). > > And I don't want to manually expand the above statement to: > > (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + array[,,8]) / 8 > > (ok, now I have done it ;) > > > > So, my thinking was that I can easily expand and change a string (with the paste-function / repeat-function etc.). > But how can I expand a mathematical statement? > > > thanks for any suggestions! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Wed Oct 5 22:23:33 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 13:23:33 -0700 Subject: [R] Display a contingency table on the X11 device In-Reply-To: References: Message-ID: Hi: One option is the gridExtra package - run the example associated with the tableGrob() function. Another is the addtable2plot() function in the plotrix package. I'm pretty sure there's at least one other package that can do this; I thought it was in the gplots package, but couldn't find one that I think should apply. Perhaps others can chime in... Dennis On Wed, Oct 5, 2011 at 6:08 AM, Parker Jones wrote: > > Hello, > > I'd like to output a table to the x11 device, but I can't seem to find an easy way to do it. ?Specifically, I'd like to display a 2x2 contingency table alongside a graphical plot, but can only see how to output to the console. ?Is there a library that can do this? > > Thanks for any suggestions, > Parker > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From kw.stat at gmail.com Wed Oct 5 22:24:49 2011 From: kw.stat at gmail.com (Kevin Wright) Date: Wed, 5 Oct 2011 15:24:49 -0500 Subject: [R] Needed help with 3 factor anova !!! In-Reply-To: References: <1317837369210-3875620.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jcbouette at gmail.com Wed Oct 5 22:28:02 2011 From: jcbouette at gmail.com (=?ISO-8859-1?Q?Jean=2DChristophe_BOU=CBTT=C9?=) Date: Wed, 5 Oct 2011 16:28:02 -0400 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: References: Message-ID: Hi, are you looking for # reproducible example x <- 1:1000 dim(x)<-rep(10,3) # code apply(x,1:2,sum) note that ?apply works with many functions... 2011/10/5 Martin Batholdy : > Dear R-group, > > > is there a way to perform calculations that are defined in a string format? > > > for example I have different variables: > > x1 <- 3 > x2 <- 1 > x4 <- 1 > > and a string-variable: > > do <- 'x1 + x2 + x3' > > > Is there any way to perform what the variable 'do'-describes > (just like the formula-element but more elemental)? > > > > Perhaps my idea to solve my problem is a little bit strange. > > > My general problem is, that I have to do arithmetics for which there seems to be no function available that I can apply in order to be more flexible. > > > To be precise, I have to add up three dimensional arrays. > > I can do that like this (as someone suggested on this help-list ? thanks for that!): > > (array[,,1] + array[,,2] + array[,,3]) / 3 > > > However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' > (or 10 or x). > > And I don't want to manually expand the above statement to: > > (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + array[,,8]) / 8 > > (ok, now I have done it ;) > > > > So, my thinking was that I can easily expand and change a string (with the paste-function / repeat-function etc.). > But how can I expand a mathematical statement? > > > thanks for any suggestions! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From baptiste.auguie at googlemail.com Wed Oct 5 22:29:46 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Thu, 6 Oct 2011 09:29:46 +1300 Subject: [R] Display a contingency table on the X11 device In-Reply-To: References: Message-ID: On 6 October 2011 09:23, Dennis Murphy wrote: > Hi: > > One option is the gridExtra package - run the example associated with > the tableGrob() function. Another is the addtable2plot() function in > the plotrix package. I'm pretty sure there's at least one other > package that can do this; I thought it was in the gplots package, but > couldn't find one that I think should apply. Perhaps others can chime > in... it's called gplots::textplot() baptiste > > Dennis > > On Wed, Oct 5, 2011 at 6:08 AM, Parker Jones wrote: >> >> Hello, >> >> I'd like to output a table to the x11 device, but I can't seem to find an easy way to do it. ?Specifically, I'd like to display a 2x2 contingency table alongside a graphical plot, but can only see how to output to the console. ?Is there a library that can do this? >> >> Thanks for any suggestions, >> Parker >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Greg.Snow at imail.org Wed Oct 5 22:30:44 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Wed, 5 Oct 2011 14:30:44 -0600 Subject: [R] subplot strange behavoir In-Reply-To: <1317843603409-3875917.post@n4.nabble.com> References: <1317843603409-3875917.post@n4.nabble.com> Message-ID: When I copy and paste your code I get what is expected, the 2 subplots line up on the same y-value. What version of R are you using, which version of subplot? What platform? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of emorway > Sent: Wednesday, October 05, 2011 1:40 PM > To: r-help at r-project.org > Subject: [R] subplot strange behavoir > > Hello, > > Below is some example code that should reproduce an error I'm > encountering > while trying to create a tiff plot with two subplots. If I run just > the > following bit of code through the R GUI the result is what I'd like to > have > appear in the saved tiff image: > > x<-seq(0:20) > y<-c(1,1,2,2,3,4,5,4,3,6,7,1,1,2,2,3,4,5,4,3,6) > plot(x,y,type="l",las=1,ylim=c(0,12)) > subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) > subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c( > 1,1.5)) > > However, if expanding on this code with: > > edm.sub<-function(x,y){plot(x,y,col="red",frame.plot=F, > las=1,xaxs="i",yaxs="i",type="b", > ylim=c(0,6),xlab="",ylab="")} > > png("c:/temp/lookat.tif",res=120,height=600,width=1200) > layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) > plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") > plot(x,y,type="l",las=1,ylim=c(0,12)) > subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) > subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c( > 1,1.5)) > dev.off() > > One will notice the second subplot is out of position (notice the > y-coordinate is the same for both subplots...y=9): > http://r.789695.n4.nabble.com/file/n3875917/lookat.png > > If I try to 'guess' a new y-coordinate for the second subplot, say > y=10: > > png("c:/temp/lookat.tif",res=120,height=600,width=1200) > layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) > plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") > plot(x,y,type="l",las=1,ylim=c(0,12)) > subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) > subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=10,size=c > (1,1.5)) > dev.off() > > R kicks back the following message > Error in plot.new() : plot region too large > > Am I mis-using subplot? > > Thanks, Eric > > -- > View this message in context: http://r.789695.n4.nabble.com/subplot- > strange-behavoir-tp3875917p3875917.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Wed Oct 5 22:31:34 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 16:31:34 -0400 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: References: Message-ID: Actually, this may just be a typo in your first post, but if you actually want to do this calculation: (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + array[,,8]) / 8 Wouldn't this work? apply(array,3,sum)/dim(array)[3] On Wed, Oct 5, 2011 at 4:22 PM, R. Michael Weylandt wrote: > Didn't three of us give you a function (in various flavors) that would > do the mean for variable inputs, reading them from a list? (Though > David's was admittedly much cooler than mine!) > > Anyways, look into parse(text=do) with eval() if you want to go the > string route. > > Michael > > On Wed, Oct 5, 2011 at 4:14 PM, Martin Batholdy wrote: >> Dear R-group, >> >> >> is there a way to perform calculations that are defined in a string format? >> >> >> for example I have different variables: >> >> x1 <- 3 >> x2 <- 1 >> x4 <- 1 >> >> and a string-variable: >> >> do <- 'x1 + x2 + x3' >> >> >> Is there any way to perform what the variable 'do'-describes >> (just like the formula-element but more elemental)? >> >> >> >> Perhaps my idea to solve my problem is a little bit strange. >> >> >> My general problem is, that I have to do arithmetics for which there seems to be no function available that I can apply in order to be more flexible. >> >> >> To be precise, I have to add up three dimensional arrays. >> >> I can do that like this (as someone suggested on this help-list ? thanks for that!): >> >> (array[,,1] + array[,,2] + array[,,3]) / 3 >> >> >> However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' >> (or 10 or x). >> >> And I don't want to manually expand the above statement to: >> >> (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + array[,,8]) / 8 >> >> (ok, now I have done it ;) >> >> >> >> So, my thinking was that I can easily expand and change a string (with the paste-function / repeat-function etc.). >> But how can I expand a mathematical statement? >> >> >> thanks for any suggestions! >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > From michael.weylandt at gmail.com Wed Oct 5 22:32:12 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 16:32:12 -0400 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: References: Message-ID: Sorry!! meant: apply(array,1:2,sum)/dim(array)[3] M On Wed, Oct 5, 2011 at 4:31 PM, R. Michael Weylandt wrote: > Actually, this may just be a typo in your first post, but if you > actually want to do this calculation: > > (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + > array[,,6] + array[,,7] + array[,,8]) / 8 > > Wouldn't this work? > > apply(array,3,sum)/dim(array)[3] > > > On Wed, Oct 5, 2011 at 4:22 PM, R. Michael Weylandt > wrote: >> Didn't three of us give you a function (in various flavors) that would >> do the mean for variable inputs, reading them from a list? (Though >> David's was admittedly much cooler than mine!) >> >> Anyways, look into parse(text=do) with eval() if you want to go the >> string route. >> >> Michael >> >> On Wed, Oct 5, 2011 at 4:14 PM, Martin Batholdy wrote: >>> Dear R-group, >>> >>> >>> is there a way to perform calculations that are defined in a string format? >>> >>> >>> for example I have different variables: >>> >>> x1 <- 3 >>> x2 <- 1 >>> x4 <- 1 >>> >>> and a string-variable: >>> >>> do <- 'x1 + x2 + x3' >>> >>> >>> Is there any way to perform what the variable 'do'-describes >>> (just like the formula-element but more elemental)? >>> >>> >>> >>> Perhaps my idea to solve my problem is a little bit strange. >>> >>> >>> My general problem is, that I have to do arithmetics for which there seems to be no function available that I can apply in order to be more flexible. >>> >>> >>> To be precise, I have to add up three dimensional arrays. >>> >>> I can do that like this (as someone suggested on this help-list ? thanks for that!): >>> >>> (array[,,1] + array[,,2] + array[,,3]) / 3 >>> >>> >>> However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' >>> (or 10 or x). >>> >>> And I don't want to manually expand the above statement to: >>> >>> (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + array[,,8]) / 8 >>> >>> (ok, now I have done it ;) >>> >>> >>> >>> So, my thinking was that I can easily expand and change a string (with the paste-function / repeat-function etc.). >>> But how can I expand a mathematical statement? >>> >>> >>> thanks for any suggestions! >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > From kw.stat at gmail.com Wed Oct 5 22:33:42 2011 From: kw.stat at gmail.com (Kevin Wright) Date: Wed, 5 Oct 2011 15:33:42 -0500 Subject: [R] Difficulty with lme In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Wed Oct 5 22:36:25 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 13:36:25 -0700 Subject: [R] Display a contingency table on the X11 device In-Reply-To: References: Message-ID: Thanks, Baptiste. I was looking for tableplot() or something like it and thought textplot() was doing something different. Appreciate the correction. Dennis On Wed, Oct 5, 2011 at 1:29 PM, baptiste auguie wrote: > On 6 October 2011 09:23, Dennis Murphy wrote: >> Hi: >> >> One option is the gridExtra package - run the example associated with >> the tableGrob() function. Another is the addtable2plot() function in >> the plotrix package. I'm pretty sure there's at least one other >> package that can do this; I thought it was in the gplots package, but >> couldn't find one that I think should apply. Perhaps others can chime >> in... > > it's called gplots::textplot() > > baptiste >> >> Dennis >> >> On Wed, Oct 5, 2011 at 6:08 AM, Parker Jones wrote: >>> >>> Hello, >>> >>> I'd like to output a table to the x11 device, but I can't seem to find an easy way to do it. ?Specifically, I'd like to display a 2x2 contingency table alongside a graphical plot, but can only see how to output to the console. ?Is there a library that can do this? >>> >>> Thanks for any suggestions, >>> Parker >>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > From jeff.breiwick at noaa.gov Wed Oct 5 22:46:32 2011 From: jeff.breiwick at noaa.gov (Jeff Breiwick) Date: Wed, 5 Oct 2011 20:46:32 +0000 Subject: [R] R CMD check Message-ID: Dear R-Group, I have a function that sorts a data frame and oneo of the lines in the function is: vars <- unlist(strsplit(formc, "[\\+\\-]")) The function works fine and the above line is always reached. However, when I include the function in a package and run "R CMD check pkgname" it gives this error message: '\+' is an unrecognized escape in character string starting "[\+" Execution halted I am running Win 7, R-2.13.1 and run R CMD ... from a command line, using shell ("start cmd"). I didn't write the function but it has always run fine without any errors. Is there a way to resolve this so it will pass the package check? Thank you. Jeff From emorway at usgs.gov Wed Oct 5 23:05:58 2011 From: emorway at usgs.gov (emorway) Date: Wed, 5 Oct 2011 14:05:58 -0700 (PDT) Subject: [R] subplot strange behavoir In-Reply-To: References: <1317843603409-3875917.post@n4.nabble.com> Message-ID: <1317848758658-3876178.post@n4.nabble.com> Hello Greg, Session info is below. Running Win7 64-bit. I just upgraded my version of R and tried rerunning the code and got the same odd result. I, too, get an expected result when I create the plot in the R GUI. The problem crops up only when I try and create the plot in png() or tiff(). Perhaps I need to try par(new=T), or some other trick-of-the-trade? If you're unable to recreate the problem, I suppose this is nearly a dead-end? sessionInfo() R version 2.13.2 (2011-09-30) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] TeachingDemos_2.7 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.14.0 grid_2.13.2 lattice_0.19-33 tools_2.13.2 -- View this message in context: http://r.789695.n4.nabble.com/subplot-strange-behavoir-tp3875917p3876178.html Sent from the R help mailing list archive at Nabble.com. From batholdy at googlemail.com Wed Oct 5 23:10:19 2011 From: batholdy at googlemail.com (Martin Batholdy) Date: Wed, 5 Oct 2011 23:10:19 +0200 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: References: Message-ID: <2ADD9011-3437-4C06-A6EF-F90C6121DA9C@googlemail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ahoerner at rprogress.org Wed Oct 5 23:14:02 2011 From: ahoerner at rprogress.org (andrewH) Date: Wed, 5 Oct 2011 14:14:02 -0700 (PDT) Subject: [R] reporting multiple objects out of a function In-Reply-To: <1317835117366-3875488.post@n4.nabble.com> References: <1317788874982-3873380.post@n4.nabble.com> <1317835117366-3875488.post@n4.nabble.com> Message-ID: <1317849242871-3876201.post@n4.nabble.com> Thanks, Sina! This is very helpful and informative, but still not quite what I want. So, here is the thing: When a function returns an object, that object is available in the calling environment. If it is returned inside a function, it is available in the function, but not outside of the function. What I want to do is simply to return more than one object in the usual sense in which functions return objects. Here is a test to see if a function fun does this, at least to the depth of 1. obj1 <- 1 obj2 <- 2 cat("obj1 in global=", obj1) cat("obj2 in global=", obj2) wrapFun <- function(fun) { obj1 <- 3 obj2 <- 4 cat("obj1 in calling=", obj1) cat("obj2 in calling=", obj2) fun() cat("obj in calling=", obj) cat("obj1 in calling=", obj1) cat("obj2 in calling=", obj2) } cat("obj1 in global=", obj1) cat("obj2 in global=", obj2) Suppose the function "fun" assigns the values 5 and 6 to obj1 and obj2. If the function does what I want, this code should print: obj1 in global= 1 obj2 in global= 2 obj1 in calling= 3 obj2 in calling= 4 obj1 in calling= 5 obj2 in calling= 6 obj1 in global= 1 obj2 in global= 2 I turned Paul?s and Sina?s code into functions as follows: paulFun <- function() { obj1 <<- 5; obj2 <<- 6; } sinaFun <- function() { attach(what = NULL, name = "my_env") assign("obj1", 5, envir = as.environment("my_env")) assign("obj1", 5, envir = as.environment("my_env")) } Running these two functions in the code above yields: paulFun: obj1 in global= 1 obj2 in global= 2 obj1 in calling= 3 obj2 in calling= 4 obj1 in calling= 3 obj2 in calling= 4 obj1 in global= 5 obj2 in global= 6 So paulFun puts the objects in the global environment but not in the calling environment. Let?s try sinaFun: sinaFun: obj1 in global= 1 obj2 in global= 2 obj1 in calling= 3 obj2 in calling= 4 obj1 in calling= 3 obj2 in calling= 4 obj1 in global= 1 obj2 in global= 2 sinaFun puts the objects in the new environment it defines, but they are available in neither the calling nor the global environment. However, I was immediately convinced that Sina had given me the tool I was missing: the assign function. (Thanks, Sina!) But I was wrong (or used it wrong), and now I am even more deeply confused. Here is a function that I thought would do what I want: andrewFun <- function() { assign("obj1", 5, pos = sys.parent(n = 1)) assign("obj2", 6, pos = sys.parent(n = 1)) NULL } However, when I tried it, my results were the same as paulFun: assigned in the global environment, but not in the calling environment. Setting n = 0 seemed to limit the assignment to the interior of andrewFun: none of the printed obj values were affected. Help? andrewH -- View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3876201.html Sent from the R help mailing list archive at Nabble.com. From michael.weylandt at gmail.com Wed Oct 5 23:16:34 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Wed, 5 Oct 2011 17:16:34 -0400 Subject: [R] do calculations as defined by a string / expand mathematical statements in R In-Reply-To: <2ADD9011-3437-4C06-A6EF-F90C6121DA9C@googlemail.com> References: <2ADD9011-3437-4C06-A6EF-F90C6121DA9C@googlemail.com> Message-ID: # Changing to variable Z since array() is a function apply(Z.temp <- Z[,,,a:b],1:3,sum)/dim(Z.temp)[4] # Should work, though it may be more clear to define Z.temp in its own line M On Wed, Oct 5, 2011 at 5:10 PM, Martin Batholdy wrote: > Thanks for all the suggestions! > > > > Perhaps my post was not clear enough. > > apply(array,1:2,sum)/dim(array)[3] > > and > > # reproducible example > x <- 1:1000 > dim(x)<-rep(10,3) > # code > apply(x,1:2,sum) > > > would give me the mean over one whole dimension, right? > The problem with that is, that I just want to calculate the mean over a subset of t (where t is the 4th dimension of the array). > And the range of this subset should be easily changeable. > > > So for example I have 4D array: > > x <- 1:10000 > dim(x)<-rep(10,4) > > Now I would like to average the 3D array(x,y,z) in the 4th dimension (t) from t_start = a to t_end = b. > I don't want to average the whole 3D array. > > > > On 05.10.2011, at 22:21, William Dunlap wrote: > >> Avoid parsing strings to make expressions. ?It is easy >> to do, but hard to do safely and readably. >> >> In your case you could make a short loop out of it >> ? result <- x[,,,1] >> ? for(i in seq_len(dim(x)[4])[-1]) { >> ? ? ?result <- result + x[,,,i] >> ? } >> ? result <- result / dim(x)[4] >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com > > > Wouldn't that be much slower than define a string and evaluate it as an expression since I would have to use a for-loop? > > > > > thanks again! > You helped me a lot today ;) > > > > > > On 05.10.2011, at 22:21, William Dunlap wrote: > >> Avoid parsing strings to make expressions. ?It is easy >> to do, but hard to do safely and readably. >> >> In your case you could make a short loop out of it >> ? result <- x[,,,1] >> ? for(i in seq_len(dim(x)[4])[-1]) { >> ? ? ?result <- result + x[,,,i] >> ? } >> ? result <- result / dim(x)[4] >> >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >>> -----Original Message----- >>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Batholdy >>> Sent: Wednesday, October 05, 2011 1:14 PM >>> To: R Help >>> Subject: [R] do calculations as defined by a string / expand mathematical statements in R >>> >>> Dear R-group, >>> >>> >>> is there a way to perform calculations that are defined in a string format? >>> >>> >>> for example I have different variables: >>> >>> x1 <- 3 >>> x2 <- 1 >>> x4 <- 1 >>> >>> and a string-variable: >>> >>> do <- 'x1 + x2 + x3' >>> >>> >>> Is there any way to perform what the variable 'do'-describes >>> (just like the formula-element but more elemental)? >>> >>> >>> >>> Perhaps my idea to solve my problem is a little bit strange. >>> >>> >>> My general problem is, that I have to do arithmetics for which there seems to be no function available >>> that I can apply in order to be more flexible. >>> >>> >>> To be precise, I have to add up three dimensional arrays. >>> >>> I can do that like this (as someone suggested on this help-list - thanks for that!): >>> >>> (array[,,1] + array[,,2] + array[,,3]) / 3 >>> >>> >>> However in my case it can happen that at some point, I don't have to add 3 but 8 'array-slices' >>> (or 10 or x). >>> >>> And I don't want to manually expand the above statement to: >>> >>> (array[,,1] + array[,,2] + array[,,3] + array[,,4] + array[,,5] + array[,,6] + array[,,7] + >>> array[,,8]) / 8 >>> >>> (ok, now I have done it ;) >>> >>> >>> >>> So, my thinking was that I can easily expand and change a string (with the paste-function / repeat- >>> function etc.). >>> But how can I expand a mathematical statement? >>> >>> >>> thanks for any suggestions! >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From murdoch.duncan at gmail.com Wed Oct 5 23:20:30 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 5 Oct 2011 17:20:30 -0400 Subject: [R] R CMD check In-Reply-To: References: Message-ID: <4E8CCA1E.5060404@gmail.com> On 05/10/2011 4:46 PM, Jeff Breiwick wrote: > Dear R-Group, > > I have a function that sorts a data frame and oneo of the lines in the > function is: > > vars<- unlist(strsplit(formc, "[\\+\\-]")) > > The function works fine and the above line is always reached. However, when I > include the function in a package and run "R CMD check pkgname" it gives this > error message: > > '\+' is an unrecognized escape in character string starting "[\+" > Execution halted > > I am running Win 7, R-2.13.1 and run R CMD ... from a command line, using > shell ("start cmd"). > > I didn't write the function but it has always run fine without any errors. > > Is there a way to resolve this so it will pass the package check? That looks strange. Can you put your package source online somewhere so we could try it? Duncan Murdoch From djnordlund at frontier.com Wed Oct 5 23:25:41 2011 From: djnordlund at frontier.com (Daniel Nordlund) Date: Wed, 5 Oct 2011 14:25:41 -0700 Subject: [R] do calculations as defined by a string / expandmathematical statements in R In-Reply-To: <2ADD9011-3437-4C06-A6EF-F90C6121DA9C@googlemail.com> References: <2ADD9011-3437-4C06-A6EF-F90C6121DA9C@googlemail.com> Message-ID: <959A85E252E945E09E59F80BD207F465@Gandalf> > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Martin Batholdy > Sent: Wednesday, October 05, 2011 2:10 PM > To: R Help > Subject: Re: [R] do calculations as defined by a string / > expandmathematical statements in R > > Thanks for all the suggestions! > > > > Perhaps my post was not clear enough. > > apply(array,1:2,sum)/dim(array)[3] > > and > > # reproducible example > x <- 1:1000 > dim(x)<-rep(10,3) > # code > apply(x,1:2,sum) > > > would give me the mean over one whole dimension, right? > The problem with that is, that I just want to calculate the mean over a > subset of t (where t is the 4th dimension of the array). > And the range of this subset should be easily changeable. > > > So for example I have 4D array: > > x <- 1:10000 > dim(x)<-rep(10,4) > > Now I would like to average the 3D array(x,y,z) in the 4th dimension (t) > from t_start = a to t_end = b. > I don't want to average the whole 3D array. > I don't have specific advice to address your question, but I do have an observation on this series of posts. It is often the case that people new to R try to program like they would if they were using C, or SAS, or ... whatever they are used to. I can't help but think that if you provided some context for what your tasks are you might find that someone on the list is working in the same area and could provide advice tailored to what you need to do. Why, there might even be a package or two that already provide the functionality that you need. So, broadly speaking, what do these multidimensional arrays represent and what are you trying to do with them? Dan Daniel Nordlund Bothell, WA USA From christopher.a.hane at gmail.com Wed Oct 5 23:09:41 2011 From: christopher.a.hane at gmail.com (Chris) Date: Wed, 5 Oct 2011 21:09:41 +0000 Subject: [R] Party extract BinaryTree from cforest? References: Message-ID: I found an internal workaround to this to support printing and plot type simple, tt<-party:::prettytree(cf at ensemble[[1]], names(cf at data at get("input"))) > npt <- new("BinaryTree") > npt at tree<-tt > plot(npt) Error in terminal_panel() : ?ctreeobj? is not a regression tree > plot(npt, type="simple") Any additional help is appreciated. From emorway at usgs.gov Wed Oct 5 23:35:23 2011 From: emorway at usgs.gov (emorway) Date: Wed, 5 Oct 2011 14:35:23 -0700 (PDT) Subject: [R] subplot strange behavoir In-Reply-To: References: <1317843603409-3875917.post@n4.nabble.com> Message-ID: <1317850523252-3876285.post@n4.nabble.com> I tried this trick, and clearly things are not going in the right direction. It seems 'layout' is at the root of my frustration, so I can make two plots and marge them in adobe illustrator (or something similar). png("c:/temp/lookat.png",res=120,height=600,width=1200) layout(matrix(c(1,2),2,2,byrow=TRUE),c(1.5,2.5),respect=TRUE) plot(seq(1:10),seq(1:10),type="l",las=1,col="blue") plot(x,y,type="l",las=1,ylim=c(0,12)) subplot(edm.sub(x[seq(1:5)],y[seq(1:5)]),x=4,y=9,size=c(1,1.5)) par(new=T) plot(x,y,las=1,ylim=c(0,12),type="n",ann=F,las=1,col="blue") subplot(edm.sub(x[seq(15,20,by=1)],y[seq(15,20,by=1)]),x=17,y=9,size=c(1,1.5)) dev.off() http://r.789695.n4.nabble.com/file/n3876285/lookat.png -- View this message in context: http://r.789695.n4.nabble.com/subplot-strange-behavoir-tp3875917p3876285.html Sent from the R help mailing list archive at Nabble.com. From jeff.breiwick at noaa.gov Wed Oct 5 23:37:31 2011 From: jeff.breiwick at noaa.gov (Jeff Breiwick) Date: Wed, 5 Oct 2011 21:37:31 +0000 Subject: [R] R CMD check References: Message-ID: Jeff Breiwick noaa.gov> writes: > > Dear R-Group, > > I have a function that sorts a data frame and oneo of the lines in the > function is: > > vars <- unlist(strsplit(formc, "[\\+\\-]")) > > The function works fine and the above line is always reached. However, when I > include the function in a package and run "R CMD check pkgname" it gives this > error message: > > '\+' is an unrecognized escape in character string starting "[\+" > Execution halted > > I am running Win 7, R-2.13.1 and run R CMD ... from a command line, using > shell ("start cmd"). > > I didn't write the function but it has always run fine without any errors. > > Is there a way to resolve this so it will pass the package check? > Thank you. > > Jeff > To test 'R CMC check' I created a test package with only this one function in it. After running package.skeleton() the only thing I did was to edit DESCRIPTION, test-package.Rd and sort.data.frame.Rd. The offending line is 13: sort.data.frame <- function (x, by) { # From R Help list, by Kevin Wright, Sept. 2004 # Sorts data.frame by columns, either ascending or descending # e.g. sort.data.frame(x, by = ~ year + fishery + area) ############################################################## if (by[[1]] != "~") stop("Argument 'by' must be a one-sided formula.") formc <- as.character(by[2]) formc <- gsub(" ", "", formc) if (!is.element(substring(formc, 1, 1), c("+", "-"))) formc <- paste("+", formc, sep = "") vars <- unlist(strsplit(formc, "[\\+\\-]")) vars <- vars[vars != ""] calllist <- list() pos <- 1 for (i in 1:length(vars)) { varsign <- substring(formc, pos, pos) pos <- pos + 1 + nchar(vars[i]) if (is.factor(x[, vars[i]])) { if (varsign == "-") { calllist[[i]] <- -rank(x[, vars[i]]) } else { calllist[[i]] <- rank(x[, vars[i]]) } } else { if (varsign == "-") { calllist[[i]] <- -x[, vars[i]] } else { calllist[[i]] <- x[, vars[i]] } } } return(x[do.call("order", calllist), ]) } -Jeff From rmh at temple.edu Wed Oct 5 23:48:39 2011 From: rmh at temple.edu (Richard M. Heiberger) Date: Wed, 5 Oct 2011 17:48:39 -0400 Subject: [R] R CMD check In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jeff.breiwick at noaa.gov Wed Oct 5 23:56:17 2011 From: jeff.breiwick at noaa.gov (Jeff Breiwick) Date: Wed, 5 Oct 2011 21:56:17 +0000 Subject: [R] R CMD check References: Message-ID: Richard M. Heiberger temple.edu> writes: > > The next thing to check is this item from doc/manual/R-exts.html > > Quoted strings within R-like text are handled specially... > > My guess is that the problem is occuring in the .Rd file, not in the .R > file. > > Remove the line, or double the "\" characters. > > Rich Yes, the error appears to be in the .Rd file so I will modify that. Thanks. Jeff From carl at witthoft.com Thu Oct 6 00:03:34 2011 From: carl at witthoft.com (Carl Witthoft) Date: Wed, 05 Oct 2011 18:03:34 -0400 Subject: [R] dynamically creating functions in r Message-ID: <4E8CD436.5010902@witthoft.com> Another way to build functions "from scratch" : > func<-'x^2+5' > funcderiv<- D(parse(text=func), 'x') ) > newtparam <- function(zvar) {} > body(newtparam)[2] <- parse(text=paste('newz <- (',func,')/eval(funcderiv)',collapse='')) > body(newtparam)[3] <- parse(text=paste('return(invisible(newz))')) > newtparam function (zvar) { newz <- (x^2 + 5)/eval(funcderiv) return(invisible(newz)) } The important thing to know if you go with this method is that body(your_function)[1] is always "{}" Carl -- ----- Sent from my Cray XK6 From donaldngwe at gmail.com Wed Oct 5 23:27:24 2011 From: donaldngwe at gmail.com (darkgaze) Date: Wed, 5 Oct 2011 14:27:24 -0700 (PDT) Subject: [R] Subsetting question Message-ID: <1317850044902-3876252.post@n4.nabble.com> Hi all, Suppose I have data1 A B 1 a 1 b 2 c 2 d and data2 D E F x y 1 w z 2 and I want data2 D E F G x y 1 a,b w z 3 c,d I am trying data2$G=list(data1$B[data1$A==data2$F,]) How do I correct this approach? -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-question-tp3876252p3876252.html Sent from the R help mailing list archive at Nabble.com. From this.is.mvw at gmail.com Thu Oct 6 01:38:06 2011 From: this.is.mvw at gmail.com (Mike Williamson) Date: Wed, 5 Oct 2011 16:38:06 -0700 Subject: [R] any way to convert back to DateTime class when "accidental" conversion to numeric? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From patmmccann at gmail.com Wed Oct 5 22:15:18 2011 From: patmmccann at gmail.com (Patrick McCann) Date: Wed, 5 Oct 2011 16:15:18 -0400 Subject: [R] unique possible bug Message-ID: Hi, I am trying to read in a rather large list of transactions using the arules library. It seems in the coerce method into the dgCmatrix, it somewhere calls unique. Unique.c throws an error when n > 536870912; however, when 4*n was modified to 2*n in 2004, the overflow protection should have changed from 2^29 to 2^30, right? If so, how would I change it in my copy? Do I have to recompile everything? Thanks, Patrick McCann Here is a simple to reproduce example: > runif(2^29+5)->a > sum(unique(a))->b Error in unique.default(a) : length 536870917 is too large for hashing > traceback() 3: unique.default(a) 2: unique(a) 1: unique(a) > unique.default function (x, incomparables = FALSE, fromLast = FALSE, ...) { z <- .Internal(unique(x, incomparables, fromLast)) if (is.factor(x)) factor(z, levels = seq_len(nlevels(x)), labels = levels(x), ordered = is.ordered(x)) else if (inherits(x, "POSIXct")) structure(z, class = class(x), tzone = attr(x, "tzone")) else if (inherits(x, "Date")) structure(z, class = class(x)) else z } >From http://svn.r-project.org/R/trunk/src/main/unique.c I see: /* Choose M to be the smallest power of 2 not less than 2*n and set K = log2(M). Need K >= 1 and hence M >= 2, and 2^M <= 2^31 -1, hence n <= 2^29. Dec 2004: modified from 4*n to 2*n, since in the worst case we have a 50% full table, and that is still rather efficient -- see R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606. */ static void MKsetup(int n, HashData *d) { int n4 = 2 * n; if(n < 0 || n > 536870912) /* protect against overflow to -ve */ error(_("length %d is too large for hashing"), n); d->M = 2; d->K = 1; while (d->M < n4) { d->M *= 2; d->K += 1; } } From Sam_Smith at me.com Wed Oct 5 22:08:22 2011 From: Sam_Smith at me.com (Sam) Date: Wed, 05 Oct 2011 21:08:22 +0100 Subject: [R] Dealing with proportions Message-ID: <33F6DCED-D330-4148-A9CB-1CF6B3E3D009@me.com> Dear list, I have very little experience in dealing with proportions, i am sure this is a very simple question but i could find no suitable answer beyond doing a chi-sq test and then using the Marascuilo procedure as a post-hoc analysis. I am simply wanting to know if the proportions ( i.e the number of Yes / No) significantly differ between the cases and if so which cases are significantly high or low? proportion <- structure(list(Case = structure(1:11, .Label = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K"), class = "factor"), Yes = c(18L, 2L, 1L, 2L, 44L, 27L, 2L, 15L, 13L, 3L, 34L), No = c(171L, 11L, 5L, 8L, 146L, 80L, 5L, 30L, 22L, 5L, 42L), Num = c(189L, 13L, 6L, 10L, 190L, 107L, 7L, 45L, 35L, 8L, 76L)), .Names = c("Case", "Yes", "No", "Num"), class = "data.frame", row.names = c(NA, -11L)) Thanks in advance Sam From linshuang11 at gmail.com Wed Oct 5 23:07:29 2011 From: linshuang11 at gmail.com (sevenfrost) Date: Wed, 5 Oct 2011 14:07:29 -0700 (PDT) Subject: [R] cuhre usage ?? multidimensional integration In-Reply-To: References: <1317792382956-3873478.post@n4.nabble.com> Message-ID: <1317848849542-3876182.post@n4.nabble.com> cuhre is a function in package R2Cuba for multidimensional integration. There are three kind of variables to integrate. I just use y,w,t to distinguish them from each other. Is there any problem ? -- View this message in context: http://r.789695.n4.nabble.com/cuhre-usage-multidimensional-integration-tp3873478p3876182.html Sent from the R help mailing list archive at Nabble.com. From david.wiley at gmail.com Thu Oct 6 00:50:27 2011 From: david.wiley at gmail.com (David Wiley) Date: Wed, 5 Oct 2011 16:50:27 -0600 Subject: [R] Help with wireframe graphics problem (newbie) Message-ID: All, I've read several tutorials re: generating wireframes, but am clearly missing something. I have data along the lines of: > tbl [1:10,] Visits Activity Course.Grade 1 17 2 18.31 2 7 11 20.67 3 9 17 24.69 4 28 71 38.72 5 43 107 45.46 6 14 5 47.77 7 25 51 57.81 8 32 27 59.46 9 60 43 64.39 10 31 46 66.26 When I run the command: wireframe(Course.Grade ~ Visits * Activity, data = tbl) I get an appropriately labeled, but empty, cube. Can anyone tell me why there's no data in my cube? Thanks in advance, David From evap4442 at gmail.com Thu Oct 6 01:45:05 2011 From: evap4442 at gmail.com (Eva Powers) Date: Wed, 5 Oct 2011 16:45:05 -0700 Subject: [R] aggregate function with a dataframe for both "x" and "by" Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From connerpharmd at yahoo.com Thu Oct 6 02:24:16 2011 From: connerpharmd at yahoo.com (Chris Conner) Date: Wed, 5 Oct 2011 17:24:16 -0700 (PDT) Subject: [R] Issue with read.csv treatment of numerics enclosed in quotes (and a confession) Message-ID: <1317860656.77753.YahooMailNeo@web160712.mail.bf1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mdsumner at gmail.com Thu Oct 6 02:36:56 2011 From: mdsumner at gmail.com (Michael Sumner) Date: Thu, 6 Oct 2011 11:36:56 +1100 Subject: [R] kriging shapefiles In-Reply-To: <1317805629.20625.YahooMailNeo@web110713.mail.gq1.yahoo.com> References: <1317805629.20625.YahooMailNeo@web110713.mail.gq1.yahoo.com> Message-ID: It is perhaps easier than you think, see the example in gstat: library(gstat) ?krige After this is run "meuse" is a SpatialPointsDataFrame: coordinates(meuse) = ~x+y Any point shapefile read with readOGR from rgdal (or the alternative functions in maptools) will also be SpatialPointsDataFrames, and so the code will work much the same. To use other kriging functions that perhaps use data.frames, just use as.data.frame(x) to convert a SpatialPointDataFrame to the non-spatial version. Cheers, Mike. On Wed, Oct 5, 2011 at 8:07 PM, Leynnard Rey Matillano wrote: > Hi! Im new to R and I need to interpolate a shapefile using kriging. I've been able to plot/read the shapefile using the package maptools or rgdal. I've searched the internet for sample codes but most of the kriging codes that I've found done in R is done using txtfiles or CSVs. ?An example could be of great help. Thanks. > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsumner at gmail.com From sarah.goslee at gmail.com Thu Oct 6 02:41:35 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 5 Oct 2011 20:41:35 -0400 Subject: [R] Issue with read.csv treatment of numerics enclosed in quotes (and a confession) In-Reply-To: <1317860656.77753.YahooMailNeo@web160712.mail.bf1.yahoo.com> References: <1317860656.77753.YahooMailNeo@web160712.mail.bf1.yahoo.com> Message-ID: Hi Chris, Yes, you're missing something: the colClasses argument to read.csv. Given a tiny little csv file that looks like this: 1,2,3,"01234" 4,5,6,"00011" 7,8,0,"00000" > testdata <- read.csv("testdata.csv", header=FALSE, colClasses=c(NA, NA, NA, "character")) > testdata V1 V2 V3 V4 1 1 2 3 01234 2 4 5 6 00011 3 7 8 0 00000 > str(testdata) 'data.frame': 3 obs. of 4 variables: $ V1: int 1 4 7 $ V2: int 2 5 8 $ V3: int 3 6 0 $ V4: chr "01234" "00011" "00000" That should do what you want. Not that you should need it, but sprintf() is a neater way to pad out numeric to character values: > sprintf("%05d", 12) [1] "00012" > sprintf("%05d", 1223) [1] "01223" > sprintf("%07d", 12) [1] "0000012" > Hope that solves your problem, Sarah On Wed, Oct 5, 2011 at 8:24 PM, Chris Conner wrote: > Dear Help-Rs, > > I've been dealing with this problem for some?time, using a work-around to deal with it. It's time for me to come clean with my ineptitude and seek a what has got to be a more streamlined solution from the Help-Rverse. > > I regularly import delimited text data that contains numerics enclosed in quotes (e.g., "00765288071").? Thing is, for some of these data, I need to keep the values as "character" class within the data frame (that is to say the leading zeros are important and I would like them to stay).? Here is an example of the code I would use to read an example dataset in question: > > mydata <- read.csv("~/mydata.csv", quote = "\"'") > > The problem is, when R reads the data and converts them into a data frame, inevitably, R ignores the quotes around values like the above, and reads them in as "numeric".? So R strips the valuable leading zeros and converts my "00765288071" to 765288071.? I've developed a work-arounds to this involving the use of the following: > >> whatIneed <- "00000000000" >> whatIgot <- 765288071 >> whatIgot <- as.character(whatIgot) >> substr(whatIneed, 1+nchar(whatIneed)-nchar(whatIgot), nchar(whatIneed)) <- whatIgot >> whatIneed > [1] "00765288071" > > My question is, am I missing something in how I'm writing my read.csv statement that would indicate to R that numerics enclosed in quotes should be read and imported as characters and not converted to numerics??? > -- Sarah Goslee http://www.functionaldiversity.org From rhelpacc at gmail.com Thu Oct 6 03:08:06 2011 From: rhelpacc at gmail.com (Robert A'gata) Date: Wed, 5 Oct 2011 21:08:06 -0400 Subject: [R] [R-SIG-Finance] AsOf join in R In-Reply-To: References: Message-ID: Hi Roupell, Yes I am aware of RTAQ function matchTradesQuotes. But my time series does not follow the TAQ format like they suggest. So I gave it a try and find that it doesn't work. In particular, my time series contain full level 2 order book and trades. I want to do asof join of the book to the trade. Is there any better way? Or I misunderstand anything about the RTAQ's function? Thank you. Robert On Wed, Oct 5, 2011 at 1:59 AM, Roupell, Darko wrote: > na.locf {zoo}, should do a job, also if you look in RTAQ they wrote a function that looks on previous tick and carries forward value in case its not available. > > __________________________________________________ > Commonwealth Bank > Darko Roupell > Associate Quantitative Analyst > Institutional Banking & Markets > Equities Research > Darling Park Tower 1 > Level 23, 201 Sussex Street > Sydney, NSW 200 > P: ?+61 2 9117 1254 > F: ?+61 2 9118 1000 > M: +61 400 170 515 > E: Darko.Roupell at cba.com.au > Our vision is to be Australia's finest financial services organisation through excelling in customer service. > > Email Security > This email is sent solely for informational purposes. Hoax emails, commonly referred to as phishing, can appear to be from the Commonwealth Bank and ask you to update or confirm details such as client numbers, passwords, personal identification questions, contact details or account numbers. The Commonwealth Bank will never send you an email asking you to confirm, update or reveal your confidential banking information. > Important Information > Produced by Global Markets Research, a business unit of Commonwealth Bank of Australia ABN 48 123 123 124 - AFSL 234945 (Commonwealth Bank). This publication is based on information available at the time of publishing.? We believe that the information in this communication is correct and any opinions, conclusions or recommendations are reasonably held or made as at the time of its compilation, but no warranty is made as to accuracy, reliability or completeness.? To the extent permitted by law, neither Commonwealth Bank nor any of its subsidiaries accept liability to any person for loss or damage arising from the use of this communication. This communication does not purport to be a complete statement or summary. > The information provided has been prepared without considering your objectives, financial situation or needs, and before acting on the information, you should consider its appropriateness to your circumstances. No person should act on the basis of this report without considering and if necessary taking appropriate professional advice upon their own particular circumstances. > Commonwealth Bank of Australia, as a provider of investment, borrowing and other financial services undertakes financial transactions with many corporate entities in Australia. This may include any corporate issuer referred to in this communication. Commonwealth Bank and its subsidiaries have effected or may effect transactions for their own account in any investments or related investments referred to herein. In the case of certain securities Commonwealth Bank is or may be the only market maker. > > -----Original Message----- > From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Robert A'gata > Sent: Wednesday, 5 October 2011 2:41 PM > To: r-help at r-project.org; r-sig-finance at stat.math.ethz.ch > Subject: [R-SIG-Finance] AsOf join in R > > Hi, > > I tried to google for any solution for asof join operator in R. But I > couldn't find one. The asof join operator AsOf(A,B) merges 2 time > series by looking for latest available value of B prior to each time > point in A. For example, > > A <- xts(c(10,15,20,25), > order.by=as.POSIXct(c("2011-09-01","2011-09-09","2011-09-10","2011-09-15")) > > B <- xts(c(1.1,1.5,1.3,1.7), > order.by=as.POSIXct(c("2011-08-31","2011-09-09","2011-09-11","2011-09-12")) > > AsOf(A,B) should return > > ? ? ? ? ? ? ? ? ? ?A ? ? ? B > 2011-09-01 ? ?10 ? ? 1.1 > 2011-09-09 ? ?15 ? ? 1.1 ? ? # ?(because latest value B prior to > 2011-09-09 is 1.1) > 2011-09-10 ? ?20 ? ? 1.5 > 2011-09-15 ? ?25 ? ? 1.7 > > How do I write the above AsOf function in R? The merge function does > not do what I want because it will align points that have the same > time stamp together while what I want is actually latest value prior > to timestamp in A. Any example would be greatly appreciated. Thank > you. > > Cheers, > > Robert > > _______________________________________________ > R-SIG-Finance at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. > > ************** IMPORTANT MESSAGE ***************************** > This e-mail message is intended only for the addressee(s) and contains information which may be > confidential. > If you are not the intended recipient please advise the sender by return email, do not use or > disclose the contents, and delete the message and any attachments from your system. Unless > specifically indicated, this email does not constitute formal advice or commitment by the sender > or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. > We can be contacted through our web site: commbank.com.au. > If you no longer wish to receive commercial electronic messages from us, please reply to this > e-mail by typing Unsubscribe in the subject line. > ************************************************************** > > > > From jholtman at gmail.com Thu Oct 6 03:12:00 2011 From: jholtman at gmail.com (jim holtman) Date: Wed, 5 Oct 2011 21:12:00 -0400 Subject: [R] any way to convert back to DateTime class when "accidental" conversion to numeric? In-Reply-To: References: Message-ID: Here is what I use: unix2POSIXct(1317857320) [1] "2011-10-05 19:28:40 EDT" unix2POSIXct <- function (time) structure(time, class = c("POSIXt", "POSIXct")) On Wed, Oct 5, 2011 at 7:38 PM, Mike Williamson wrote: > Hi, > > ? ?In short, I would like to know if there is any way to convert a numeric > into a date, similar to how strptime() can convert a string to a date time > class? > > ? ?There are some functions, etc. which don't work well with dates, and > tend to force them into numerics. ?I understand that the number it spits > back is the number of seconds since the beginning of 1970 (see the first few > sentences of the "Details" portion of ?DateTimeClasses). > ? ?However, it's a bit of a hassle to convert that by hand. ?I can create a > function to do this, and it isn't so hard, but I found it hard to believe > such a function didn't already exist, so I wanted to ask the community. > > ? ?As an example, today (Oct 5th 2011 at approximately 4:30pm, Pacific > time) is approximately 1317857320 as a numeric, but I would like to know how > to go from that number back to the "2011-10-05 16:28:39 PDT" date time class > which originally generated it. > > ? ? ? ? ? ? ? ? ? ? ? ?Thanks! > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Mike > > --- > XKCD > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jholtman at gmail.com Thu Oct 6 03:18:49 2011 From: jholtman at gmail.com (jim holtman) Date: Wed, 5 Oct 2011 21:18:49 -0400 Subject: [R] Subsetting question In-Reply-To: <1317850044902-3876252.post@n4.nabble.com> References: <1317850044902-3876252.post@n4.nabble.com> Message-ID: Does this do what you want: > data1 A B 1 1 a 2 1 b 3 2 c 4 2 d > data2 D E F 1 x y 1 2 w z 2 > data1.1 <- aggregate(data1$B, list(data1$A), FUN=paste, collapse=',') > data1.1 Group.1 x 1 1 a,b 2 2 c,d > merge(data2, data1.1, by.x="F", by.y="Group.1") F D E x 1 1 x y a,b 2 2 w z c,d > On Wed, Oct 5, 2011 at 5:27 PM, darkgaze wrote: > Hi all, > > Suppose I have > > data1 > A B > 1 a > 1 b > 2 c > 2 d > > and > > data2 > D E F > x y 1 > w z 2 > > and I want > > data2 > D E F G > x y 1 a,b > w z 3 c,d > > I am trying > > data2$G=list(data1$B[data1$A==data2$F,]) > > How do I correct this approach? > > -- > View this message in context: http://r.789695.n4.nabble.com/Subsetting-question-tp3876252p3876252.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From rhelpacc at gmail.com Thu Oct 6 03:27:41 2011 From: rhelpacc at gmail.com (Robert A'gata) Date: Wed, 5 Oct 2011 21:27:41 -0400 Subject: [R] [R-SIG-Finance] AsOf join in R In-Reply-To: References: Message-ID: Darko - Looking at it carefully. Yes, you're right. It's still native R code function. I know how to proceed now. Thanks. On Wed, Oct 5, 2011 at 9:22 PM, Roupell, Darko wrote: > Btw tweaking MatchTradesQuotes should not be an issue and its easy to accommodate any data format that is passed through in xts object. > > At least that's what I did with RTAQ package - used it as a shell to create unique functions that suit data format for intra-day tick flow from ASX. > > > Hope this helps. > > __________________________________________________ > Commonwealth Bank > Darko Roupell > Associate Quantitative Analyst > Institutional Banking & Markets > Equities Research > Darling Park Tower 1 > Level 23, 201 Sussex Street > Sydney, NSW 200 > P: ?+61 2 9117 1254 > F: ?+61 2 9118 1000 > M: +61 400 170 515 > E: Darko.Roupell at cba.com.au > Our vision is to be Australia's finest financial services organisation through excelling in customer service. > > Email Security > This email is sent solely for informational purposes. Hoax emails, commonly referred to as phishing, can appear to be from the Commonwealth Bank and ask you to update or confirm details such as client numbers, passwords, personal identification questions, contact details or account numbers. The Commonwealth Bank will never send you an email asking you to confirm, update or reveal your confidential banking information. > Important Information > Produced by Global Markets Research, a business unit of Commonwealth Bank of Australia ABN 48 123 123 124 - AFSL 234945 (Commonwealth Bank). This publication is based on information available at the time of publishing.? We believe that the information in this communication is correct and any opinions, conclusions or recommendations are reasonably held or made as at the time of its compilation, but no warranty is made as to accuracy, reliability or completeness.? To the extent permitted by law, neither Commonwealth Bank nor any of its subsidiaries accept liability to any person for loss or damage arising from the use of this communication. This communication does not purport to be a complete statement or summary. > The information provided has been prepared without considering your objectives, financial situation or needs, and before acting on the information, you should consider its appropriateness to your circumstances. No person should act on the basis of this report without considering and if necessary taking appropriate professional advice upon their own particular circumstances. > Commonwealth Bank of Australia, as a provider of investment, borrowing and other financial services undertakes financial transactions with many corporate entities in Australia. This may include any corporate issuer referred to in this communication. Commonwealth Bank and its subsidiaries have effected or may effect transactions for their own account in any investments or related investments referred to herein. In the case of certain securities Commonwealth Bank is or may be the only market maker. > > > -----Original Message----- > From: Robert A'gata [mailto:rhelpacc at gmail.com] > Sent: Thursday, 6 October 2011 12:08 PM > To: Roupell, Darko; r-help at r-project.org; r-sig-finance at stat.math.ethz.ch > Subject: Re: [R-SIG-Finance] AsOf join in R > > Hi Roupell, > > Yes I am aware of RTAQ function matchTradesQuotes. But my time series > does not follow the TAQ format like they suggest. So I gave it a try > and find that it doesn't work. In particular, my time series contain > full level 2 order book and trades. I want to do asof join of the book > to the trade. Is there any better way? Or I misunderstand anything > about the RTAQ's function? Thank you. > > Robert > > On Wed, Oct 5, 2011 at 1:59 AM, Roupell, Darko wrote: >> na.locf {zoo}, should do a job, also if you look in RTAQ they wrote a function that looks on previous tick and carries forward value in case its not available. >> >> __________________________________________________ >> Commonwealth Bank >> Darko Roupell >> Associate Quantitative Analyst >> Institutional Banking & Markets >> Equities Research >> Darling Park Tower 1 >> Level 23, 201 Sussex Street >> Sydney, NSW 200 >> P: ?+61 2 9117 1254 >> F: ?+61 2 9118 1000 >> M: +61 400 170 515 >> E: Darko.Roupell at cba.com.au >> Our vision is to be Australia's finest financial services organisation through excelling in customer service. >> >> Email Security >> This email is sent solely for informational purposes. Hoax emails, commonly referred to as phishing, can appear to be from the Commonwealth Bank and ask you to update or confirm details such as client numbers, passwords, personal identification questions, contact details or account numbers. The Commonwealth Bank will never send you an email asking you to confirm, update or reveal your confidential banking information. >> Important Information >> Produced by Global Markets Research, a business unit of Commonwealth Bank of Australia ABN 48 123 123 124 - AFSL 234945 (Commonwealth Bank). This publication is based on information available at the time of publishing.? We believe that the information in this communication is correct and any opinions, conclusions or recommendations are reasonably held or made as at the time of its compilation, but no warranty is made as to accuracy, reliability or completeness.? To the extent permitted by law, neither Commonwealth Bank nor any of its subsidiaries accept liability to any person for loss or damage arising from the use of this communication. This communication does not purport to be a complete statement or summary. >> The information provided has been prepared without considering your objectives, financial situation or needs, and before acting on the information, you should consider its appropriateness to your circumstances. No person should act on the basis of this report without considering and if necessary taking appropriate professional advice upon their own particular circumstances. >> Commonwealth Bank of Australia, as a provider of investment, borrowing and other financial services undertakes financial transactions with many corporate entities in Australia. This may include any corporate issuer referred to in this communication. Commonwealth Bank and its subsidiaries have effected or may effect transactions for their own account in any investments or related investments referred to herein. In the case of certain securities Commonwealth Bank is or may be the only market maker. >> >> -----Original Message----- >> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Robert A'gata >> Sent: Wednesday, 5 October 2011 2:41 PM >> To: r-help at r-project.org; r-sig-finance at stat.math.ethz.ch >> Subject: [R-SIG-Finance] AsOf join in R >> >> Hi, >> >> I tried to google for any solution for asof join operator in R. But I >> couldn't find one. The asof join operator AsOf(A,B) merges 2 time >> series by looking for latest available value of B prior to each time >> point in A. For example, >> >> A <- xts(c(10,15,20,25), >> order.by=as.POSIXct(c("2011-09-01","2011-09-09","2011-09-10","2011-09-15")) >> >> B <- xts(c(1.1,1.5,1.3,1.7), >> order.by=as.POSIXct(c("2011-08-31","2011-09-09","2011-09-11","2011-09-12")) >> >> AsOf(A,B) should return >> >> ? ? ? ? ? ? ? ? ? ?A ? ? ? B >> 2011-09-01 ? ?10 ? ? 1.1 >> 2011-09-09 ? ?15 ? ? 1.1 ? ? # ?(because latest value B prior to >> 2011-09-09 is 1.1) >> 2011-09-10 ? ?20 ? ? 1.5 >> 2011-09-15 ? ?25 ? ? 1.7 >> >> How do I write the above AsOf function in R? The merge function does >> not do what I want because it will align points that have the same >> time stamp together while what I want is actually latest value prior >> to timestamp in A. Any example would be greatly appreciated. Thank >> you. >> >> Cheers, >> >> Robert >> >> _______________________________________________ >> R-SIG-Finance at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions should go. >> >> ************** IMPORTANT MESSAGE ***************************** >> This e-mail message is intended only for the addressee(s) and contains information which may be >> confidential. >> If you are not the intended recipient please advise the sender by return email, do not use or >> disclose the contents, and delete the message and any attachments from your system. Unless >> specifically indicated, this email does not constitute formal advice or commitment by the sender >> or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. >> We can be contacted through our web site: commbank.com.au. >> If you no longer wish to receive commercial electronic messages from us, please reply to this >> e-mail by typing Unsubscribe in the subject line. >> ************************************************************** >> >> >> >> > > ************** IMPORTANT MESSAGE ***************************** > This e-mail message is intended only for the addressee(s) and contains information which may be > confidential. > If you are not the intended recipient please advise the sender by return email, do not use or > disclose the contents, and delete the message and any attachments from your system. Unless > specifically indicated, this email does not constitute formal advice or commitment by the sender > or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. > We can be contacted through our web site: commbank.com.au. > If you no longer wish to receive commercial electronic messages from us, please reply to this > e-mail by typing Unsubscribe in the subject line. > ************************************************************** > > > > From bbolker at gmail.com Thu Oct 6 04:35:46 2011 From: bbolker at gmail.com (Ben Bolker) Date: Thu, 6 Oct 2011 02:35:46 +0000 Subject: [R] Difficulty with lme References: Message-ID: Kevin Wright gmail.com> writes: > > Generally, the only way to estimate f1:f2 is if you have all combinations of > data present for these two factors. Well, he said it was unbalanced, he didn't say how unbalanced -- i.e. it's not clear (to me) whether there are any completely missing cells or not ... > On Wed, Oct 5, 2011 at 2:00 PM, Brad Davis > gmail.com> wrote: > > > > I'm having some difficulty with lme. I am currently trying to run the > > following simple model > > > > anova(lme(x ~ f1 + f2 + f1:f2, data=m, random=~1|r1)) [which you could also specify as ~f1*f2] > > Which is currently producing the error > > > > Error in MEEM(object, conLin, control$niterEM) : > > Singularity in backsolve at level 0, block 1 > > > > x is a numeric vector containing 194 observations. f1 is a factor vector > > containing two levels, and f2 is a different factor vector containing 5 > > different levels. R1 is a another factor vector containing 13 different > > levels, and it is again, unbalanaced. f1, f2 and r1 are unbalanced, but I > > can't do anything about it. The data comes from wild-caught samples and > > not > > from a nice, neat experiment. If I change the model specification slightly > > removing the interaction term (e.g. anova(lme(x ~ f1 + f2, data=m, > > random=~1|r1)) ), then lme proceeds without producing any errors. I have a couple of suggestions: (1) try lmer (it will at least work differently, and might work better) (2) try expanding your model out to a one-way design -- lme(x~interaction(f1,f2),data=m,random=~1|r1) Follow-ups should probably be sent to r-sig-mixed-models at r-project.org From djmuser at gmail.com Thu Oct 6 04:37:37 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 19:37:37 -0700 Subject: [R] aggregate function with a dataframe for both "x" and "by" In-Reply-To: References: Message-ID: Hi: It's a little tricky to read in a data frame 'by hand' without making NA a default missing value; you've got to trick it a bit. I'm doing this inefficiently, but if you have the two 'real' data sets stored in separate files, read.table() is the way to go since it provides an option for defining the form of the missing values. data.frame() doesn't have that option. To that end, mydata <- data.frame(testvar1=c(1,3,5,7,8,3,5,NA,4,5,7,9), testvar2=c(11,33,55,77,88,33,55,NA,44,55,77,99)) mybys <- data.frame(mbn1=c('red','blue',1,2,NA,'big',1,2,'red',1,NA,12), mbn2=c('wet','dry',99,95,NA,'damp',95,99,'red',99,NA,NA), stringsAsFactors = FALSE) mybys # You can tell that NA is a missing value since NA is printed as (character NA). # This is not the method you want, but the following 'works': lines <- " mbn1 mbn2 red wet blue dry 1 99 2 95 NA NA big damp 1 95 2 99 red red 1 99 NA NA 12 NA" mybys <- read.table(textConnection(lines), stringsAsFactors = FALSE, na.strings = "") closeAllConnections() # Now mybys treats NA as a character string. # Moral: if NA is a legitimate value, keep the data in an external file # and read it in with read.table(), using the na.strings = argument # to specify an alternative missing value string. You don't need to # use textConnection() as I did if you have the data in an external file. Next problem: mydata has 12 rows, mybys has 13. I combined the two data frames with cbind() using only the first 12 rows of mybys and then used the ddply() function from the plyr package to do the groupwise summation in addition to aggregate(). [Several other packages would also work here, including doBy and data.table.] The results are slightly different. # Combined data: myd <- cbind(mydata, mybys[1:12, ]) # Summation function with na.rm = TRUE: # Will return 0 if all values of x are NA sfun <- function(x) sum(x, na.rm = TRUE) # aggregate() version: aggregate(cbind(testvar1, testvar2) ~ V1 + V2, data = myd, FUN = sfun) V1 V2 testvar1 testvar2 1 2 95 8 88 2 1 99 14 154 3 2 99 4 44 4 big damp 5 55 5 blue dry 5 55 6 mbn1 mbn2 1 11 7 NA NA 12 132 8 red red 5 55 9 red wet 3 33 # ddply() version: library('plyr') ddply(myd, .(V1, V2), colwise(sfun, c('testvar1', 'testvar2'))) V1 V2 testvar1 testvar2 1 1 95 0 0 2 1 99 14 154 3 2 95 8 88 4 2 99 4 44 5 big damp 5 55 6 blue dry 5 55 7 mbn1 mbn2 1 11 8 NA NA 12 132 9 red red 5 55 10 red wet 3 33 Hope this is what you were after. Dennis On Wed, Oct 5, 2011 at 4:45 PM, Eva Powers wrote: > I have 2 dataframes. ?"mydata" contains numerical data. "mybys" contains > information on the "group" each row of the data is in. ?I wish to aggregate > each column in mydata using the corresponding column in mybys. > > > > Please see the example below. ?What is a more elegant or "better" way to > accomplish this task? > > > > Thanks! > > > > > > mydata = data.frame(testvar1=c(1,3,5,7,8,3,5,NA,4,5,7,9), > testvar2=c(11,33,55,77,88,33,55,NA,44,55,77,99) > ) > > > > mybys=data.frame(mbn1=c('red','blue',1,2,NA,'big',1,2,'red',1,NA,12),mbn2=c('wet','dry',99,95,NA,'damp',95,99,'red',99,NA,NA) > , stringsAsFactors =F) > > > > myaggs <- data.frame(matrix(data=NA, nrow=nrow(mydata), ncol=ncol(mydata) ) > ) > > > > for(i in 1: ncol(mydata) ) { > > temp <- aggregate(mydata[i], by = as.list(mybys[i]), FUN=sum, na.rm=T) > > rownums <- match(mybys[,i],temp[,1]) > > myaggs[,i] <- temp[rownums,2] > > } > > myaggs > > > > > > Finally, how do I convert and use "mybys" to factors, so that I can tell R > that the NA values form a group? > > > > I tried substituting this line above: > > temp <- aggregate(mydata[,i], by = as.list(mybys[,i]), FUN=sum, na.rm=T) > > > > ... but get the error message: "Error in > aggregate.data.frame(as.data.frame(x), ...) : > ?arguments must have same length" > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jdnewmil at dcn.davis.ca.us Thu Oct 6 02:20:36 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Wed, 05 Oct 2011 17:20:36 -0700 Subject: [R] any way to convert back to DateTime class when "accidental" conversion to numeric? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Darko.Roupell at cba.com.au Thu Oct 6 03:22:39 2011 From: Darko.Roupell at cba.com.au (Roupell, Darko) Date: Thu, 6 Oct 2011 12:22:39 +1100 Subject: [R] [R-SIG-Finance] AsOf join in R In-Reply-To: References: Message-ID: Btw tweaking MatchTradesQuotes should not be an issue and its easy to accommodate any data format that is passed through in xts object. At least that's what I did with RTAQ package - used it as a shell to create unique functions that suit data format for intra-day tick flow from ASX. Hope this helps. __________________________________________________ Commonwealth Bank Darko Roupell Associate Quantitative Analyst Institutional Banking & Markets Equities Research Darling Park Tower 1 Level 23, 201 Sussex Street Sydney, NSW 200 P: +61 2 9117 1254 F: +61 2 9118 1000 M: +61 400 170 515 E: Darko.Roupell at cba.com.au Our vision is to be Australia's finest financial services organisation through excelling in customer service. Email Security This email is sent solely for informational purposes. Hoax emails, commonly referred to as phishing, can appear to be from the Commonwealth Bank and ask you to update or confirm details such as client numbers, passwords, personal identification questions, contact details or account numbers. The Commonwealth Bank will never send you an email asking you to confirm, update or reveal your confidential banking information. Important Information Produced by Global Markets Research, a business unit of Commonwealth Bank of Australia ABN 48 123 123 124 - AFSL 234945 (Commonwealth Bank). This publication is based on information available at the time of publishing.? We believe that the information in this communication is correct and any opinions, conclusions or recommendations are reasonably held or made as at the time of its compilation, but no warranty is made as to accuracy, reliability or completeness.? To the extent permitted by law, neither Commonwealth Bank nor any of its subsidiaries accept liability to any person for loss or damage arising from the use of this communication. This communication does not purport to be a complete statement or summary.? The information provided has been prepared without considering your objectives, financial situation or needs, and before acting on the information, you should consider its appropriateness to your circumstances. No person should act on the basis of this report without considering and if necessary taking appropriate professional advice upon their own particular circumstances.? Commonwealth Bank of Australia, as a provider of investment, borrowing and other financial services undertakes financial transactions with many corporate entities in Australia. This may include any corporate issuer referred to in this communication. Commonwealth Bank and its subsidiaries have effected or may effect transactions for their own account in any investments or related investments referred to herein. In the case of certain securities Commonwealth Bank is or may be the only market maker. -----Original Message----- From: Robert A'gata [mailto:rhelpacc at gmail.com] Sent: Thursday, 6 October 2011 12:08 PM To: Roupell, Darko; r-help at r-project.org; r-sig-finance at stat.math.ethz.ch Subject: Re: [R-SIG-Finance] AsOf join in R Hi Roupell, Yes I am aware of RTAQ function matchTradesQuotes. But my time series does not follow the TAQ format like they suggest. So I gave it a try and find that it doesn't work. In particular, my time series contain full level 2 order book and trades. I want to do asof join of the book to the trade. Is there any better way? Or I misunderstand anything about the RTAQ's function? Thank you. Robert On Wed, Oct 5, 2011 at 1:59 AM, Roupell, Darko wrote: > na.locf {zoo}, should do a job, also if you look in RTAQ they wrote a function that looks on previous tick and carries forward value in case its not available. > > __________________________________________________ > Commonwealth Bank > Darko Roupell > Associate Quantitative Analyst > Institutional Banking & Markets > Equities Research > Darling Park Tower 1 > Level 23, 201 Sussex Street > Sydney, NSW 200 > P: ?+61 2 9117 1254 > F: ?+61 2 9118 1000 > M: +61 400 170 515 > E: Darko.Roupell at cba.com.au > Our vision is to be Australia's finest financial services organisation through excelling in customer service. > > Email Security > This email is sent solely for informational purposes. Hoax emails, commonly referred to as phishing, can appear to be from the Commonwealth Bank and ask you to update or confirm details such as client numbers, passwords, personal identification questions, contact details or account numbers. The Commonwealth Bank will never send you an email asking you to confirm, update or reveal your confidential banking information. > Important Information > Produced by Global Markets Research, a business unit of Commonwealth Bank of Australia ABN 48 123 123 124 - AFSL 234945 (Commonwealth Bank). This publication is based on information available at the time of publishing.? We believe that the information in this communication is correct and any opinions, conclusions or recommendations are reasonably held or made as at the time of its compilation, but no warranty is made as to accuracy, reliability or completeness.? To the extent permitted by law, neither Commonwealth Bank nor any of its subsidiaries accept liability to any person for loss or damage arising from the use of this communication. This communication does not purport to be a complete statement or summary. > The information provided has been prepared without considering your objectives, financial situation or needs, and before acting on the information, you should consider its appropriateness to your circumstances. No person should act on the basis of this report without considering and if necessary taking appropriate professional advice upon their own particular circumstances. > Commonwealth Bank of Australia, as a provider of investment, borrowing and other financial services undertakes financial transactions with many corporate entities in Australia. This may include any corporate issuer referred to in this communication. Commonwealth Bank and its subsidiaries have effected or may effect transactions for their own account in any investments or related investments referred to herein. In the case of certain securities Commonwealth Bank is or may be the only market maker. > > -----Original Message----- > From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Robert A'gata > Sent: Wednesday, 5 October 2011 2:41 PM > To: r-help at r-project.org; r-sig-finance at stat.math.ethz.ch > Subject: [R-SIG-Finance] AsOf join in R > > Hi, > > I tried to google for any solution for asof join operator in R. But I > couldn't find one. The asof join operator AsOf(A,B) merges 2 time > series by looking for latest available value of B prior to each time > point in A. For example, > > A <- xts(c(10,15,20,25), > order.by=as.POSIXct(c("2011-09-01","2011-09-09","2011-09-10","2011-09-15")) > > B <- xts(c(1.1,1.5,1.3,1.7), > order.by=as.POSIXct(c("2011-08-31","2011-09-09","2011-09-11","2011-09-12")) > > AsOf(A,B) should return > > ? ? ? ? ? ? ? ? ? ?A ? ? ? B > 2011-09-01 ? ?10 ? ? 1.1 > 2011-09-09 ? ?15 ? ? 1.1 ? ? # ?(because latest value B prior to > 2011-09-09 is 1.1) > 2011-09-10 ? ?20 ? ? 1.5 > 2011-09-15 ? ?25 ? ? 1.7 > > How do I write the above AsOf function in R? The merge function does > not do what I want because it will align points that have the same > time stamp together while what I want is actually latest value prior > to timestamp in A. Any example would be greatly appreciated. Thank > you. > > Cheers, > > Robert > > _______________________________________________ > R-SIG-Finance at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. > > ************** IMPORTANT MESSAGE ***************************** > This e-mail message is intended only for the addressee(s) and contains information which may be > confidential. > If you are not the intended recipient please advise the sender by return email, do not use or > disclose the contents, and delete the message and any attachments from your system. Unless > specifically indicated, this email does not constitute formal advice or commitment by the sender > or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. > We can be contacted through our web site: commbank.com.au. > If you no longer wish to receive commercial electronic messages from us, please reply to this > e-mail by typing Unsubscribe in the subject line. > ************************************************************** > > > > ************** IMPORTANT MESSAGE ***************************** This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please advise the sender by return email, do not use or disclose the contents, and delete the message and any attachments from your system. Unless specifically indicated, this email does not constitute formal advice or commitment by the sender or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. We can be contacted through our web site: commbank.com.au. If you no longer wish to receive commercial electronic messages from us, please reply to this e-mail by typing Unsubscribe in the subject line. From 1 at VictoriasJourney.com Thu Oct 6 04:41:58 2011 From: 1 at VictoriasJourney.com (Victoria_Stuart) Date: Wed, 5 Oct 2011 19:41:58 -0700 (PDT) Subject: [R] Entering data into a multi-way array? In-Reply-To: <1317501510635-3863670.post@n4.nabble.com> References: <1317438749773-3862054.post@n4.nabble.com> <1317501510635-3863670.post@n4.nabble.com> Message-ID: <1317868918560-3876874.post@n4.nabble.com> Solution: I figured this out on my own (below). > gnames<-read.csv("/home/victoria/R/gnames.csv",header=FALSE,sep=",") > gnames V1 V2 V3 V4 V5 V6 V7 1 NM_005588 NM_004407 NM_006136 NM_004817 NM_006012 NM_001008693 NM_181435 [snip] V497 V498 V499 V500 1 NM_152546 XM_375762 NM_020696 NM_138459 # Saved 500 genes (values only): 3 replicates of (8 time points for treatment 1) + 3 replicates of (8 time points for treatment 2) = 500 x 48 .csv file > X<-matrix(scan("/home/victoria/R/X.csv",n=500*48),500,48,byrow=TRUE) Read 24000 items > dim(X) [1] 500 48 > dim(X)<-c(500,3,8,2) # Looks good: Checked all, below, against 'source' file "X.ods" : OK! , , 1, 1 # = Time Point 1 replicates (columns) for Treatment 1 [,1] [,2] [,3] [1,] -3.2093000000 -2.1463000000 0.347800000 [2,] 0.1689000000 -0.0945000000 -0.383400000 [3,] -0.0487000000 -0.8695000000 -1.990200000 [snip] , , 2, 1 # = Time Point 2 replicates (columns) for Treatment 1 [,1] [,2] [,3] [1,] -2.8531 -3.2044000000 -1.3212 [2,] 0.5820 0.1358000000 0.4183 [3,] 1.0075 0.3211000000 -1.7940 [snip] , , 8, 1 # = Time Point 8 replicates (columns) for Treatment 1 [,1] [,2] [,3] [1,] -2.3661000000 -1.8353000000 0.0682 [2,] 0.6607000000 0.0726000000 -0.1255 [3,] -0.4960000000 -1.8894000000 -0.9466 [snip] , , 1, 2 # = Time Point 1 replicates (columns) for Treatment 2 [,1] [,2] [,3] [1,] -2.0151000000 -1.4385 -0.9281 [2,] -0.0482000000 0.2382 -0.1319 [3,] -1.3375000000 -1.4805 -1.4416 [snip] , , 2, 2 # = Time Point 2 replicates (columns) for Treatment 2 [,1] [,2] [,3] [1,] -2.3212 -1.3964 -1.1155000000 [2,] 0.0264 0.2057 0.1566000000 [3,] -0.3569 -0.9750 -0.8666000000 [snip] , , 7, 2 # = Time Point 7 replicates (columns) for Treatment 2 [,1] [,2] [,3] [1,] -1.2615000000 -2.1288000000 -1.8844 [2,] 0.1048000000 -0.2306000000 -0.0634 [3,] -1.4571000000 0.7725000000 -1.0384 [snip] , , 8, 2 # = Time Point 8 replicates (columns) for Treatment 2 [,1] [,2] [,3] [1,] -0.0819 -1.8137000000 -1.4085 [2,] -0.1336 0.2744000000 1.2913 [3,] -1.1404 -0.7546000000 -1.2400 [snip] Thanks anyway - Victoria :-) -- View this message in context: http://r.789695.n4.nabble.com/Entering-data-into-a-multi-way-array-tp3862054p3876874.html Sent from the R help mailing list archive at Nabble.com. From jpnolan at american.edu Thu Oct 6 05:01:26 2011 From: jpnolan at american.edu (John Nolan) Date: Wed, 5 Oct 2011 23:01:26 -0400 Subject: [R] Titles changing when a plot is redrawn Message-ID: I ran into a problem with titles on graphs. I wanted a graph with multiple subplots, with each having a title that involved both a Greek letter and an identifier for each graph. Below is a simplified version of code to do this. The graph appears fine, with the first graph having "i=1" in the title, and the second graph having "i=2" in the title. However, when I resize the graph, the plot titles change, with both showing "i=2". The titles also change when I save the plot to a file using the "File" menu, then "Save as" in Windows. Is this what should happen? I always thought that titles are static once the graph is drawn, and couldn't change. The problem occurs on some version of R, but not on others. It does occur with the latest version of R: > str(R.Version()) List of 13 $ platform : chr "i386-pc-mingw32" $ arch : chr "i386" $ os : chr "mingw32" $ system : chr "i386, mingw32" $ status : chr "" $ major : chr "2" $ minor : chr "13.2" $ year : chr "2011" $ month : chr "09" $ day : chr "30" $ svn rev : chr "57111" $ language : chr "R" $ version.string: chr "R version 2.13.2 (2011-09-30)" The problem also occurs on: R 2.13.0 on Win32 and Mac (R 2.12.0, x86_64-apple-darwin9.8.0) The problem DOES NOT occur under R 2.10.0 on Win32. If the code below is bracketed with pdf("test.pdf") and dev.off(), the correct labels appear in the file. This behavior doesn't seem to appear if there is only one plot. My guess is that the titles are being reevaluated when the plot is redrawn, and since the value of i is 2 when the redraw occurs, both labels get set to "i=2". I guess "Save as" forces a redraw because a dialog box pops up? If could be that this behavior is what is intended, and that somewhere between R 2.10.0 and R 2.13.2 an old bug was fixed. Or this behavior is not what was intended, and a bug was introduced. If the former, this should be explained to the user somewhere. If the latter, can someone track it down and fix? John Nolan #------------------------------------------------- par(mfrow=c(2,1)) for (i in 1:2) { x <- 1:100 rmse <- sin(x/5) # fake data plot(x,rmse) str1 <- bquote( paste("RMSE(",theta,"), ",i==.(i) )) title( str1 ) } #------------------------------------------------- ........................................................................... John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan at american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan ........................................................................... From ccquant at gmail.com Thu Oct 6 05:11:57 2011 From: ccquant at gmail.com (Ben qant) Date: Wed, 5 Oct 2011 21:11:57 -0600 Subject: [R] counts in quantiles in and from a matrix Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Thu Oct 6 05:18:19 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 5 Oct 2011 20:18:19 -0700 Subject: [R] counts in quantiles in and from a matrix In-Reply-To: References: Message-ID: Hi: Here's one way: m <- matrix(rpois(100, 8), nrow = 5) f <- function(x) { q <- quantile(x, c(0.1, 0.9), na.rm = TRUE) c(sum(x < q[1]), sum(x > q[2])) } t(apply(m, 1, f)) HTH, Dennis On Wed, Oct 5, 2011 at 8:11 PM, Ben qant wrote: > Hello, > > I'm trying to get the count of values in each row that are above and below > quantile thresholds. Thanks! > > Example: > >> x = matrix(1:30,5,6) >> x > ? ? [,1] [,2] [,3] [,4] [,5] [,6] > [1,] ? ?1 ? ?6 ? 11 ? 16 ? 21 ? 26 > [2,] ? ?2 ? ?7 ? 12 ? 17 ? 22 ? 27 > [3,] ? ?3 ? ?8 ? 13 ? 18 ? 23 ? 28 > [4,] ? ?4 ? ?9 ? 14 ? 19 ? 24 ? 29 > [5,] ? ?5 ? 10 ? 15 ? 20 ? 25 ? 30 >> qtl = t(apply(x, 1, quantile, probs = c(.1,.9),na.rm=T)) >> qtl > ? ? 10% ?90% > [1,] 3.5 23.5 > [2,] 4.5 24.5 > [3,] 5.5 25.5 > [4,] 6.5 26.5 > [5,] 7.5 27.5 > > I would like counts like this for each row: > > cnts > ? ?[,1] [,2] > [1,] ? 1 ? ?1 > [2,] ? 1 ? ?1 > [3,] ? 1 ? ?1 > [4,] ? 1 ? ?1 > [5,] ? 1 ? ?1 > > ...because for the first row (x[1,]) only value 1 is less than 3.5 and only > value 26 is greater 23.5 and so on for the other rows. I'm thinking its a > apply(x,1,...some FUN here...), but still getting use to apply and I've been > coding for too long... > > Also, if anyone knows how to change the background color of the r-Tinn > editor my eyes would love you! ?Off to bed. I look forward to your answers! > > Thanks! > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From ggrothendieck at gmail.com Thu Oct 6 05:22:40 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Wed, 5 Oct 2011 23:22:40 -0400 Subject: [R] reporting multiple objects out of a function In-Reply-To: <1317788874982-3873380.post@n4.nabble.com> References: <1317788874982-3873380.post@n4.nabble.com> Message-ID: On Wed, Oct 5, 2011 at 12:27 AM, andrewH wrote: > Dear folks, > > I?m trying to build a function to create and make available some variables I > frequently use for testing purposes. ?Suppose I have a function that takes > some inputs and creates (internally) several named objects. Say, > > fun1 <- function(x, y, z) {obj1 <- x; obj2 <- y; obj3 <- z > > } > > Here is the challenge: After I run it, I want the objects to be available in > the calling environment, but not necessarily in the global environment. ?I > want them to be individually available, not as part of a list or some larger > object. ?I can not figure out how to do this. ?If I understand the situation > correctly, I am trying to move several separate objects from the environment > of the function to the environment in which the function was invoked (the > ?calling environment,? yes?). > > I?m pretty sure there is a command to do this, but I?m not sure how to find > it. Any help would be greatly appreciated ? either on the necessary code, or > on how to search for it, or a reference to a good discussion of this family > of problems. If the question is how to write things into the calling environment (also called the parent frame in R) then its like this: fun1 <- function(x, y, z, env = parent.frame()) { env$x <- x env$y <- y env$z <- z } fun1(1, 2, 3) x # 1 However, this seems very close to object oriented programming where fun1 is a method and x, y and z are properties and might represent a better organization of your program. For example, library(proto) p <- proto(fun1 = function(., x, y, z) { .$x <- x .$y <- y .$z <- z }) p$fun1(1, 2, 3) # set properties x, y, z p$x # 1 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From dwinsemius at comcast.net Thu Oct 6 05:47:39 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 5 Oct 2011 23:47:39 -0400 Subject: [R] aggregate function with a dataframe for both "x" and "by" In-Reply-To: References: Message-ID: On Oct 5, 2011, at 7:45 PM, Eva Powers wrote: > I have 2 dataframes. "mydata" contains numerical data. "mybys" > contains > information on the "group" each row of the data is in. I wish to > aggregate > each column in mydata using the corresponding column in mybys. corresponding? > > Please see the example below. What is a more elegant or "better" > way to > accomplish this task? > > mydata = data.frame(testvar1=c(1,3,5,7,8,3,5,NA,4,5,7,9), > testvar2=c(11,33,55,77,88,33,55,NA,44,55,77,99) ) > > mybys=data.frame(mbn1=c('red','blue',1,2,NA,'big',1,2,'red',1,NA, > 12),mbn2=c('wet','dry',99,95,NA,'damp',95,99,'red',99,NA,NA) , > stringsAsFactors =F) > > myaggs <- data.frame(matrix(data=NA, nrow=nrow(mydata), > ncol=ncol(mydata) ) ) > > for(i in 1: ncol(mydata) ) {temp <- aggregate(mydata[i], by = > as.list(mybys[i]), FUN=sum, na.rm=T) > rownums <- match(mybys[,i],temp[,1]) > myaggs[,i] <- temp[rownums,2] } > myaggs > > Finally, how do I convert and use "mybys" to factors, so that I can > tell R > that the NA values form a group? > > I tried substituting this line above: > > temp <- aggregate(mydata[,i], by = as.list(mybys[,i]), FUN=sum, > na.rm=T) > > ... but get the error message: "Error in > aggregate.data.frame(as.data.frame(x), ...) : > arguments must have same length" > David Winsemius, MD West Hartford, CT From xie at yihui.name Thu Oct 6 05:49:24 2011 From: xie at yihui.name (Yihui Xie) Date: Wed, 5 Oct 2011 22:49:24 -0500 Subject: [R] Titles changing when a plot is redrawn In-Reply-To: References: Message-ID: I think the problem is your str1 is an unevaluated expression and will change with the value of i. You should be able to get a fixed title by this: par(mfrow = c(2, 1)) for (i in 1:2) { x <- 1:100 rmse <- sin(x/5) # fake data plot(x, rmse, main = substitute(list(RMSE(theta), i == z), list(z = i))) } Regards, Yihui -- Yihui Xie Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Wed, Oct 5, 2011 at 10:01 PM, John Nolan wrote: > I ran into a problem with titles on graphs. ?I wanted a graph with > multiple subplots, with each having a title that involved both > a Greek letter and an identifier for each graph. ?Below is a > simplified version of code to do this. ?The graph appears fine, > with the first graph having "i=1" in the title, and the second > graph having "i=2" in the title. ?However, when I resize the graph, > the plot titles change, with both showing "i=2". The titles also > change when I save the plot to a file using the "File" menu, > then "Save as" in Windows. ?Is this what should happen? ?I > always thought that titles are static once the graph is > drawn, and couldn't change. > > The problem occurs on some version of R, but not on others. > It does occur with the latest version of R: >> str(R.Version()) > List of 13 > ?$ platform ? ? ?: chr "i386-pc-mingw32" > ?$ arch ? ? ? ? ?: chr "i386" > ?$ os ? ? ? ? ? ?: chr "mingw32" > ?$ system ? ? ? ?: chr "i386, mingw32" > ?$ status ? ? ? ?: chr "" > ?$ major ? ? ? ? : chr "2" > ?$ minor ? ? ? ? : chr "13.2" > ?$ year ? ? ? ? ?: chr "2011" > ?$ month ? ? ? ? : chr "09" > ?$ day ? ? ? ? ? : chr "30" > ?$ svn rev ? ? ? : chr "57111" > ?$ language ? ? ?: chr "R" > ?$ version.string: chr "R version 2.13.2 (2011-09-30)" > > The problem also occurs on: ?R 2.13.0 on Win32 > ?and Mac (R 2.12.0, x86_64-apple-darwin9.8.0) > The problem DOES NOT occur under R 2.10.0 on Win32. > > If the code below is bracketed with pdf("test.pdf") > and dev.off(), the correct labels appear in the file. > This behavior doesn't seem to appear if there is only > one plot. > > My guess is that the titles are being reevaluated when > the plot is redrawn, and since the value of i is 2 when > the redraw occurs, both labels get set to "i=2". ?I guess > "Save as" forces a redraw because a dialog box pops up? > > If could be that this behavior is what is intended, and that > somewhere between R 2.10.0 and R 2.13.2 an old bug was fixed. > Or this behavior is not what was intended, and a bug was > introduced. ?If the former, this should be explained to the user > somewhere. ?If the latter, can someone track it down and fix? > > John Nolan > > #------------------------------------------------- > par(mfrow=c(2,1)) > for (i in 1:2) { > ?x <- 1:100 > ?rmse <- sin(x/5) ?# fake data > ?plot(x,rmse) > ?str1 <- bquote( paste("RMSE(",theta,"), ",i==.(i) ?)) > ?title( str1 ) > } > #------------------------------------------------- > > > ?........................................................................... > > ?John P. Nolan > ?Math/Stat Department > ?227 Gray Hall > ?American University > ?4400 Massachusetts Avenue, NW > ?Washington, DC 20016-8050 > > ?jpnolan at american.edu > ?202.885.3140 voice > ?202.885.3155 fax > ?http://academic2.american.edu/~jpnolan > ?........................................................................... > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From ustaudinger at gmail.com Thu Oct 6 08:15:58 2011 From: ustaudinger at gmail.com (Ulrich Staudinger) Date: Thu, 06 Oct 2011 08:15:58 +0200 Subject: [R] [R-SIG-Finance] AsOf join in R In-Reply-To: References: Message-ID: <4E8D479E.10304@gmail.com> A bit late, but here is what I always do: m = merge(bid, ask, tick) m<- interpNA(m, method="before") intrepNA can also interpolate NAs in different ways, for example linearly. Hth, Ulrich Am 06.10.2011 03:08, schrieb Robert A'gata: > Hi Roupell, > > Yes I am aware of RTAQ function matchTradesQuotes. But my time series > does not follow the TAQ format like they suggest. So I gave it a try > and find that it doesn't work. In particular, my time series contain > full level 2 order book and trades. I want to do asof join of the book > to the trade. Is there any better way? Or I misunderstand anything > about the RTAQ's function? Thank you. > > Robert > > On Wed, Oct 5, 2011 at 1:59 AM, Roupell, Darko wrote: >> na.locf {zoo}, should do a job, also if you look in RTAQ they wrote a function that looks on previous tick and carries forward value in case its not available. >> >> __________________________________________________ >> Commonwealth Bank >> Darko Roupell >> Associate Quantitative Analyst >> Institutional Banking& Markets >> Equities Research >> Darling Park Tower 1 >> Level 23, 201 Sussex Street >> Sydney, NSW 200 >> P: +61 2 9117 1254 >> F: +61 2 9118 1000 >> M: +61 400 170 515 >> E: Darko.Roupell at cba.com.au >> Our vision is to be Australia's finest financial services organisation through excelling in customer service. >> >> Email Security >> This email is sent solely for informational purposes. Hoax emails, commonly referred to as phishing, can appear to be from the Commonwealth Bank and ask you to update or confirm details such as client numbers, passwords, personal identification questions, contact details or account numbers. The Commonwealth Bank will never send you an email asking you to confirm, update or reveal your confidential banking information. >> Important Information >> Produced by Global Markets Research, a business unit of Commonwealth Bank of Australia ABN 48 123 123 124 - AFSL 234945 (Commonwealth Bank). This publication is based on information available at the time of publishing. We believe that the information in this communication is correct and any opinions, conclusions or recommendations are reasonably held or made as at the time of its compilation, but no warranty is made as to accuracy, reliability or completeness. To the extent permitted by law, neither Commonwealth Bank nor any of its subsidiaries accept liability to any person for loss or damage arising from the use of this communication. This communication does not purport to be a complete statement or summary. >> The information provided has been prepared without considering your objectives, financial situation or needs, and before acting on the information, you should consider its appropriateness to your circumstances. No person should act on the basis of this report without considering and if necessary taking appropriate professional advice upon their own particular circumstances. >> Commonwealth Bank of Australia, as a provider of investment, borrowing and other financial services undertakes financial transactions with many corporate entities in Australia. This may include any corporate issuer referred to in this communication. Commonwealth Bank and its subsidiaries have effected or may effect transactions for their own account in any investments or related investments referred to herein. In the case of certain securities Commonwealth Bank is or may be the only market maker. >> >> -----Original Message----- >> From: r-sig-finance-bounces at r-project.org [mailto:r-sig-finance-bounces at r-project.org] On Behalf Of Robert A'gata >> Sent: Wednesday, 5 October 2011 2:41 PM >> To: r-help at r-project.org; r-sig-finance at stat.math.ethz.ch >> Subject: [R-SIG-Finance] AsOf join in R >> >> Hi, >> >> I tried to google for any solution for asof join operator in R. But I >> couldn't find one. The asof join operator AsOf(A,B) merges 2 time >> series by looking for latest available value of B prior to each time >> point in A. For example, >> >> A<- xts(c(10,15,20,25), >> order.by=as.POSIXct(c("2011-09-01","2011-09-09","2011-09-10","2011-09-15")) >> >> B<- xts(c(1.1,1.5,1.3,1.7), >> order.by=as.POSIXct(c("2011-08-31","2011-09-09","2011-09-11","2011-09-12")) >> >> AsOf(A,B) should return >> >> A B >> 2011-09-01 10 1.1 >> 2011-09-09 15 1.1 # (because latest value B prior to >> 2011-09-09 is 1.1) >> 2011-09-10 20 1.5 >> 2011-09-15 25 1.7 >> >> How do I write the above AsOf function in R? The merge function does >> not do what I want because it will align points that have the same >> time stamp together while what I want is actually latest value prior >> to timestamp in A. Any example would be greatly appreciated. Thank >> you. >> >> Cheers, >> >> Robert >> >> _______________________________________________ >> R-SIG-Finance at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-sig-finance >> -- Subscriber-posting only. If you want to post, subscribe first. >> -- Also note that this is not the r-help list where general R questions should go. >> >> ************** IMPORTANT MESSAGE ***************************** >> This e-mail message is intended only for the addressee(s) and contains information which may be >> confidential. >> If you are not the intended recipient please advise the sender by return email, do not use or >> disclose the contents, and delete the message and any attachments from your system. Unless >> specifically indicated, this email does not constitute formal advice or commitment by the sender >> or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. >> We can be contacted through our web site: commbank.com.au. >> If you no longer wish to receive commercial electronic messages from us, please reply to this >> e-mail by typing Unsubscribe in the subject line. >> ************************************************************** >> >> >> >> > _______________________________________________ > R-SIG-Finance at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R questions should go. > From ripley at stats.ox.ac.uk Thu Oct 6 08:41:04 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Thu, 6 Oct 2011 07:41:04 +0100 (BST) Subject: [R] any way to convert back to DateTime class when "accidental" conversion to numeric? In-Reply-To: References: Message-ID: A more portable way (that function only works in some versions of R) is as.POSIXct(1317857320, origin="1970-01-01") possibly with a 'tz' argument if you need to restore the timezone. On Wed, 5 Oct 2011, jim holtman wrote: > Here is what I use: > > unix2POSIXct(1317857320) > [1] "2011-10-05 19:28:40 EDT" > > > unix2POSIXct <- function (time) structure(time, class = c("POSIXt", > "POSIXct")) > > > On Wed, Oct 5, 2011 at 7:38 PM, Mike Williamson wrote: >> Hi, >> >> ? ?In short, I would like to know if there is any way to convert a numeric >> into a date, similar to how strptime() can convert a string to a date time >> class? >> >> ? ?There are some functions, etc. which don't work well with dates, and >> tend to force them into numerics. ?I understand that the number it spits >> back is the number of seconds since the beginning of 1970 (see the first few >> sentences of the "Details" portion of ?DateTimeClasses). >> ? ?However, it's a bit of a hassle to convert that by hand. ?I can create a >> function to do this, and it isn't so hard, but I found it hard to believe >> such a function didn't already exist, so I wanted to ask the community. >> >> ? ?As an example, today (Oct 5th 2011 at approximately 4:30pm, Pacific >> time) is approximately 1317857320 as a numeric, but I would like to know how >> to go from that number back to the "2011-10-05 16:28:39 PDT" date time class >> which originally generated it. >> >> ? ? ? ? ? ? ? ? ? ? ? ?Thanks! >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Mike >> >> --- >> XKCD >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From parallax at lafn.org Thu Oct 6 08:40:39 2011 From: parallax at lafn.org (SML) Date: Wed, 5 Oct 2011 23:40:39 -0700 Subject: [R] Mean(s) from values in different row? Message-ID: <20111005234039.6372f317.parallax@lafn.org> Hello: Is there a way to get a mean from values stored in different rows? The data looks like this: YEAR-1, JAN, FEB, ..., DEC YEAR-2, JAN, FEB, ..., DEC YEAR-3, JAN, FEB, ..., DEC What I want is the mean(s) for just the consecutive winter months: YEAR-1.DEC, YEAR-2.JAN, YEAR-2.FEB YEAR-2.DEC, YEAR-3.JAN, YEAR-3.FEB etc. Thanks. From alaios at yahoo.com Thu Oct 6 09:18:38 2011 From: alaios at yahoo.com (Alaios) Date: Thu, 6 Oct 2011 00:18:38 -0700 (PDT) Subject: [R] Concecutive zeros and ones Message-ID: <1317885518.51396.YahooMailNeo@web120118.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From d.rizopoulos at erasmusmc.nl Thu Oct 6 09:22:16 2011 From: d.rizopoulos at erasmusmc.nl (Dimitris Rizopoulos) Date: Thu, 06 Oct 2011 09:22:16 +0200 Subject: [R] Concecutive zeros and ones In-Reply-To: <1317885518.51396.YahooMailNeo@web120118.mail.ne1.yahoo.com> References: <1317885518.51396.YahooMailNeo@web120118.mail.ne1.yahoo.com> Message-ID: <4E8D5728.9060508@erasmusmc.nl> Check function rle(). I hope it helps. Best, Dimitris On 10/6/2011 9:18 AM, Alaios wrote: > Dear all, > I have a data series (might be vector or matrix) which is composed only from zeros and ones like the following example > > 0 0 0 1 1 0 1 0 0 1 1 1 1 0 0 0 > > I want to be able to return back the length of concecutive zeros and the length of concecutive ones. > > For that I want to have something like that returned: > > > zeros= [3 1 2 3]; > ones=[2 1 4]; > > How I can do that simply in R? > > I would like to thank you in advance for your help > > > B.R > Alex > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ From fernando.cabrera at nordea.com Thu Oct 6 09:42:56 2011 From: fernando.cabrera at nordea.com (fernando.cabrera at nordea.com) Date: Thu, 6 Oct 2011 10:42:56 +0300 Subject: [R] Populate a matrix In-Reply-To: <2885999.ONQgauKSmv@augeatur> References: <2885999.ONQgauKSmv@augeatur> Message-ID: This last solution is what I was looking for, I was trying to avoid loops. Thanks! -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rainer Schuermann Sent: 5. oktober 2011 18:29 To: r-help at r-project.org Subject: Re: [R] Populate a matrix m <- matrix( rep( y, length( x ) ), length( y ), length( x ) ) On Wednesday 05 October 2011 18:11:18 fernando.cabrera at nordea.com wrote: > Hi guys > > I have vectors x <- c(1,2,3,4) and y <- c(4,3,9) and would like to generate a matrix which has 3 rows (length(y)) and 4 columns (length(x)), and each row is the corresponding y element repeated length(x) times. > > 4,4,4,4 > 3,3,3,3 > 9,9,9,9 > > Thanks. > > Fernando ?lvarez > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From paul.hiemstra at knmi.nl Thu Oct 6 09:50:01 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Thu, 06 Oct 2011 07:50:01 +0000 Subject: [R] kriging shapefiles In-Reply-To: <1317805629.20625.YahooMailNeo@web110713.mail.gq1.yahoo.com> References: <1317805629.20625.YahooMailNeo@web110713.mail.gq1.yahoo.com> Message-ID: <4E8D5DA9.1020604@knmi.nl> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From paul.hiemstra at knmi.nl Thu Oct 6 09:55:26 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Thu, 06 Oct 2011 07:55:26 +0000 Subject: [R] reporting multiple objects out of a function In-Reply-To: <1317849242871-3876201.post@n4.nabble.com> References: <1317788874982-3873380.post@n4.nabble.com> <1317835117366-3875488.post@n4.nabble.com> <1317849242871-3876201.post@n4.nabble.com> Message-ID: <4E8D5EEE.9030809@knmi.nl> On 10/05/2011 09:14 PM, andrewH wrote: > Thanks, Sina! This is very helpful and informative, but still not quite what > I want. > > So, here is the thing: When a function returns an object, that object is > available in the calling environment. If it is returned inside a function, > it is available in the function, but not outside of the function. What I > want to do is simply to return more than one object in the usual sense in > which functions return objects. Hi, As I understand it, you want to return multiple arguments without returning them explicitly as an object. This can probably be done, by I would advice against it because it makes your code harder to read. You dump something in the calling environment, and a new user (maybe yourself in a few months) has to do a lot of reasoning of what is happening under the hood, which object is dumped in which environment. I would just return a list. Alternatively, take a look at object oriented programming like Gabor suggested. This, however, still involves returning an object... Again, I would recommend doing this the standard R way.... cheers, Paul > Here is a test to see if a function fun does this, at least to the depth of > 1. > > obj1 <- 1 > obj2 <- 2 > > cat("obj1 in global=", obj1) > cat("obj2 in global=", obj2) > > wrapFun <- function(fun) { > obj1 <- 3 > obj2 <- 4 > cat("obj1 in calling=", obj1) > cat("obj2 in calling=", obj2) > fun() > cat("obj in calling=", obj) > cat("obj1 in calling=", obj1) > cat("obj2 in calling=", obj2) > } > > cat("obj1 in global=", obj1) > cat("obj2 in global=", obj2) > > > Suppose the function "fun" assigns the values 5 and 6 to obj1 and obj2. If > the function does what I want, this code should print: > obj1 in global= 1 > obj2 in global= 2 > obj1 in calling= 3 > obj2 in calling= 4 > obj1 in calling= 5 > obj2 in calling= 6 > obj1 in global= 1 > obj2 in global= 2 > > I turned Paul?s and Sina?s code into functions as follows: > paulFun <- function() { > obj1 <<- 5; > obj2 <<- 6; > } > > sinaFun <- function() { > attach(what = NULL, name = "my_env") > assign("obj1", 5, envir = as.environment("my_env")) > assign("obj1", 5, envir = as.environment("my_env")) > } > > Running these two functions in the code above yields: > > paulFun: > obj1 in global= 1 > obj2 in global= 2 > obj1 in calling= 3 > obj2 in calling= 4 > obj1 in calling= 3 > obj2 in calling= 4 > obj1 in global= 5 > obj2 in global= 6 > > So paulFun puts the objects in the global environment but not in the calling > environment. Let?s try sinaFun: > > sinaFun: > obj1 in global= 1 > obj2 in global= 2 > obj1 in calling= 3 > obj2 in calling= 4 > obj1 in calling= 3 > obj2 in calling= 4 > obj1 in global= 1 > obj2 in global= 2 > > sinaFun puts the objects in the new environment it defines, but they are > available in neither the calling nor the global environment. However, I was > immediately convinced that Sina had given me the tool I was missing: the > assign function. (Thanks, Sina!) But I was wrong (or used it wrong), and > now I am even more deeply confused. Here is a function that I thought would > do what I want: > > andrewFun <- function() { > assign("obj1", 5, pos = sys.parent(n = 1)) > assign("obj2", 6, pos = sys.parent(n = 1)) > NULL > } > > However, when I tried it, my results were the same as paulFun: assigned in > the global environment, but not in the calling environment. Setting n = 0 > seemed to limit the assignment to the interior of andrewFun: none of the > printed obj values were affected. > > Help? > > andrewH > > > -- > View this message in context: http://r.789695.n4.nabble.com/reporting-multiple-objects-out-of-a-function-tp3873380p3876201.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From alaios at yahoo.com Thu Oct 6 10:03:53 2011 From: alaios at yahoo.com (Alaios) Date: Thu, 6 Oct 2011 01:03:53 -0700 (PDT) Subject: [R] apply and functions with many arguments Message-ID: <1317888233.78839.YahooMailNeo@web120108.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tal.galili at gmail.com Thu Oct 6 10:05:16 2011 From: tal.galili at gmail.com (Tal Galili) Date: Thu, 6 Oct 2011 10:05:16 +0200 Subject: [R] anova.rq {quantreg) - Why do different level of nesting changes the P values?! Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From claudia.beleites at ipht-jena.de Thu Oct 6 10:04:19 2011 From: claudia.beleites at ipht-jena.de (Claudia Beleites) Date: Thu, 6 Oct 2011 10:04:19 +0200 Subject: [R] speed up this algorithm (apply-fuction / 4D array) In-Reply-To: References: <4C6DA2DA-1ABB-433C-BEFE-57F66689ED17@googlemail.com> Message-ID: <4E8D6103.1080605@ipht-jena.de> here's another one - which is easier to generalize: x <- array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91)) y <- x [,,,1:90] # decide yourself what to do with slice 91, but # 91 is not divisible by 3 system.time ({ dim (y) <- c (50, 50, 50, 3, 90 %/% 3) y <- aperm (y, c (4, 1:3, 5)) v2 <- colMeans (y) }) User System verstrichen 0.32 0.08 0.40 (my computer is a bit slower than Bill's:) > system.time (v1 <- f1 (x)) User System verstrichen 0.360 0.030 0.396 Claudia Am 05.10.2011 20:24, schrieb William Dunlap: > I corrected your code a bit and put it into a function, f0, to > make testing easier. I also made a small dataset to make > testing easier. Then I made a new function f1 which does > what f0 does in a vectorized manner: > > x<- array(rnorm(50 * 50 * 50 * 91, 0, 2), dim=c(50, 50, 50, 91)) > xsmall<- array(log(seq_len(2 * 2 * 2 * 91)), dim=c(2, 2, 2, 91)) > > f0<- function(x) { > data_reduced<- array(0, dim=c(dim(x)[1:3], trunc(dim(x)[4]/3))) > reduce<- seq(1, dim(x)[4]-1, by=3) > for( i in 1:length(reduce) ) { > data_reduced[ , , , i]<- apply(x[ , , , reduce[i] : (reduce[i]+2) ], 1:3, mean) > } > data_reduced > } > > f1<- function(x) { > reduce<- seq(1, dim(x)[4]-1, by=3) > data_reduced<- (x[, , , reduce] + x[, , , reduce+1] + x[, , , reduce+2]) / 3 > data_reduced > } > > The results were: > > > system.time(v1<- f1(x)) > user system elapsed > 0.280 0.040 0.323 > > system.time(v0<- f0(x)) > user system elapsed > 73.760 0.060 73.867 > > all.equal(v0, v1) > [1] TRUE > >>> "I thought apply would already vectorize, rather than loop over every coordinate." > No, you have that backwards. Use *apply functions when you cannot figure > out how to vectorize. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Batholdy >> Sent: Wednesday, October 05, 2011 10:40 AM >> To: R Help >> Subject: [R] speed up this algorithm (apply-fuction / 4D array) >> >> Hi, >> >> >> I have this sample-code (see above) and I was wondering wether it is possible to speed things up. >> >> >> >> What this code does is the following: >> >> x is 4D array (you can imagine it as x, y, z-coordinates and a time-coordinate). >> >> So x contains 50x50x50 data-arrays for 91 time-points. >> >> Now I want to reduce the 91 time-points. >> I want to merge three consecutive time points to one time-points by calculating the mean of this three >> time-points for every x,y,z coordinate. >> >> The reduce-sequence defines which time-points should get merged. >> And the apply-function in the for-loop calculates the mean of the three 3D-Arrays and puts them into a >> new 4D array (data_reduced). >> >> >> >> The problem is that even in this example it takes really long. >> I thought apply would already vectorize, rather than loop over every coordinate. >> >> But for my actual data-set it takes a really long time ... So I would be really grateful for any >> suggestions how to speed this up. >> >> >> >> >> x<- array(rnorm(50 * 50 * 50 * 90, 0, 2), dim=c(50, 50, 50, 91)) >> >> >> >> data_reduced<- array(0, dim=c(50, 50, 50, 90/3)) >> >> reduce<- seq(1,90, 3) >> >> >> >> for( i in 1:length(reduce) ) { >> >> data_reduced[ , , , i]<- apply(x[ , , , reduce[i] : (reduce[i]+3) ], 1:3, mean) >> } >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.beleites at ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 From paul.hiemstra at knmi.nl Thu Oct 6 10:07:25 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Thu, 06 Oct 2011 08:07:25 +0000 Subject: [R] apply and functions with many arguments In-Reply-To: <1317888233.78839.YahooMailNeo@web120108.mail.ne1.yahoo.com> References: <1317888233.78839.YahooMailNeo@web120108.mail.ne1.yahoo.com> Message-ID: <4E8D61BD.1010002@knmi.nl> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tal.galili at gmail.com Thu Oct 6 10:09:43 2011 From: tal.galili at gmail.com (Tal Galili) Date: Thu, 6 Oct 2011 10:09:43 +0200 Subject: [R] Mean(s) from values in different row? In-Reply-To: <20111005234039.6372f317.parallax@lafn.org> References: <20111005234039.6372f317.parallax@lafn.org> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ccolling at purdue.edu Thu Oct 6 10:56:33 2011 From: ccolling at purdue.edu (Clayton K Collings) Date: Thu, 6 Oct 2011 04:56:33 -0400 (EDT) Subject: [R] expression set (Bioconductor) problem In-Reply-To: Message-ID: <1766332187.163437.1317891393209.JavaMail.root@mailhub039.itcs.purdue.edu> Hello R people, >dim(exprs(estrogenrma) I have an expressionSet with 8 samples and 12695 features (genes) > estrogenrma$estrogen present present absent absent present present absent absent > estrogenrma$time.h 10 10 10 10 48 48 48 48 present <- grep("present", as.character(estrogenrma$estrogen)) absent <- grep("absent", as.character(estrogenrma$estrogen)) ten <- grep("10", as.character(estrogenrma$time.h)) fortyeight <- grep("48", as.character(estrogenrma$time.h)) present.10 <- estrogenrma[, intersect(present, ten)] present.48 <- estrogenrma[, intersect(present, fortyeight)] absent.10 <- estrogenrma[, intersect(absent, ten)] absent.48 <- estrogenrma[, intersect(absent, fortyeight)] present.10, present.48, absent.10, and absent.48 are four expression sets with two samples and 12695 features. How can I make a new 2 new expressionsets, each have 12695 features and one sample where expressionset1 = (present.10 + present.48) / 2 expressionset2 = (absent.10 + absent.48) / 2 ? Thanks, Clayton ----- Original Message ----- From: "Tal Galili" To: "SML" Cc: r-help at r-project.org Sent: Thursday, October 6, 2011 4:09:43 AM Subject: Re: [R] Mean(s) from values in different row? One way for doing it would be to combine the columns using paste and then use tapply to get the means. For example: set.seed(32341) a1 = sample(c("a","b"), 100,replace = T) a2 = sample(c("a","b"), 100,replace = T) y = rnorm(100) tapply(y,paste(a1,a2), mean) ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili at gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Thu, Oct 6, 2011 at 8:40 AM, SML wrote: > Hello: > > Is there a way to get a mean from values stored in different rows? > > The data looks like this: > YEAR-1, JAN, FEB, ..., DEC > YEAR-2, JAN, FEB, ..., DEC > YEAR-3, JAN, FEB, ..., DEC > > What I want is the mean(s) for just the consecutive winter months: > YEAR-1.DEC, YEAR-2.JAN, YEAR-2.FEB > YEAR-2.DEC, YEAR-3.JAN, YEAR-3.FEB > etc. > > Thanks. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From henriMone at gmail.com Thu Oct 6 10:59:36 2011 From: henriMone at gmail.com (Henri Mone) Date: Thu, 6 Oct 2011 10:59:36 +0200 Subject: [R] Fitting parabolic function to data Message-ID: Dear R users and experts, I want to fit a shifted parabolic function with the following functional form to my data: f(x)=a0*(x+a1)^2+a2 (a0, a1 and a2 are scaling factors.) What is standard approach to do this in R? I tried the "lm" function in R but I got problems getting the above functional form. Any help is welcome :) . Greetings, Henri From ligges at statistik.tu-dortmund.de Thu Oct 6 11:34:18 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Thu, 6 Oct 2011 11:34:18 +0200 Subject: [R] R CMD check In-Reply-To: References: Message-ID: <4E8D761A.8000801@statistik.tu-dortmund.de> On 05.10.2011 23:56, Jeff Breiwick wrote: > Richard M. Heiberger temple.edu> writes: > >> >> The next thing to check is this item from doc/manual/R-exts.html >> >> Quoted strings within R-like text are handled specially... >> >> My guess is that the problem is occuring in the .Rd file, not in the .R >> file. >> >> Remove the line, or double the "\" characters. >> >> Rich > > Yes, the error appears to be in the .Rd file so I will modify that. Thanks. Then you have not deleted the function from its example section where backslashes have to be escaped once more. Uwe Ligges > > Jeff > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jim at bitwrit.com.au Thu Oct 6 11:57:55 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Thu, 06 Oct 2011 20:57:55 +1100 Subject: [R] Does it exist a function for this? In-Reply-To: <1317804140079-3873827.post@n4.nabble.com> References: <1317804140079-3873827.post@n4.nabble.com> Message-ID: <4E8D7BA3.9000404@bitwrit.com.au> On 10/05/2011 07:42 PM, lunarossa wrote: > I have this kind of matrix, with thousands of cases. > > A 2 apple > A 2 peach > A 3 peach > B 1 pear > B 4 peach > B 4 beef > B 7 beef > C 1 peach > D 2 apple > D 5 peach > > I have to distinguish, from the other rows, the rows with "peach" and this > is not a problem. > > I also have to discriminate the rows with peach like the second one > (associated with the same two cells "A" and "2" to "apple", see first row) > from the row like the 3rd or the 8th ones, when the first two cells are > "unique" ("A" and "3" or "C" and "1"). > Hi lunarossa, I may be on the wrong track, but you could just stick the three components together: alphanumfruit[,4]<-paste(alphanumfruit[,1], alphanumfruit[,2],alphanumfruit[,3],sep="") and the fourth column of your object (which I suspect is a data frame) will have elements that can be tested for matching or non-matching. More complicated conditions can be accommodated by pasting different combinations of the columns together. Jim From murdoch.duncan at gmail.com Thu Oct 6 12:07:25 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 06 Oct 2011 06:07:25 -0400 Subject: [R] Fitting parabolic function to data In-Reply-To: References: Message-ID: <4E8D7DDD.4010604@gmail.com> On 11-10-06 4:59 AM, Henri Mone wrote: > Dear R users and experts, > > I want to fit a shifted parabolic function with the following > functional form to my data: > > f(x)=a0*(x+a1)^2+a2 > > (a0, a1 and a2 are scaling factors.) > What is standard approach to do this in R? I tried the "lm" function > in R but I got problems getting the above functional form. > > Any help is welcome :) . That can be expanded into a regular quadratic: (a0*a1^2 + a2) + 2*a0*a1*x + a0*x^2 So fit a regular quadratic, and then solve for a0, a1, a2 from the resulting coefficients. The only tricky bit will be computing errors on the a's. Duncan Murdoch From jim at bitwrit.com.au Thu Oct 6 12:11:50 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Thu, 06 Oct 2011 21:11:50 +1100 Subject: [R] break.axis all range of data In-Reply-To: References: <4E8C2DD4.3070407@bitwrit.com.au> Message-ID: <4E8D7EE6.4070703@bitwrit.com.au> On 10/06/2011 12:31 AM, Heverkuhn Heverkuhn wrote: > ..all the point from 8 to 13. > > On Wed, Oct 5, 2011 at 8:28 AM, Heverkuhn Heverkuhnwrote: > >> The problem with that function is that it does not really separate the >> 2parts of the graph but it inserts , when style is gap, a blank strip that >> cover axis and points. So for example if a insert it at 8 and I set the gap >> of length 5 , it would cancel al the point from 8 to 10. >> ... >> I would like to increase the distance between x tick-marks 8 and 9, >> and not connect the points x=8 and x=9. Ah, I think I see what you want. You want an axis like this: axis(1,at=c(1:8,10:37),labels=1:36) and to get your points right, you would have to do something like: plot(c(1:8,10:37),1:36,xaxt="n") lines(1:8,1:8) lines(10:37,9:36) first. This is more or less the reverse of the gap.* functions in the plotrix package. Jim From alaios at yahoo.com Thu Oct 6 13:09:26 2011 From: alaios at yahoo.com (Alaios) Date: Thu, 6 Oct 2011 04:09:26 -0700 (PDT) Subject: [R] apply and functions with many arguments In-Reply-To: <4E8D61BD.1010002@knmi.nl> References: <1317888233.78839.YahooMailNeo@web120108.mail.ne1.yahoo.com> <4E8D61BD.1010002@knmi.nl> Message-ID: <1317899366.69069.YahooMailNeo@web120105.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From E.Vettorazzi at uke.de Thu Oct 6 13:21:52 2011 From: E.Vettorazzi at uke.de (Eik Vettorazzi) Date: Thu, 6 Oct 2011 13:21:52 +0200 Subject: [R] anova.rq {quantreg) - Why do different level of nesting changes the P values?! In-Reply-To: References: Message-ID: <4E8D8F50.9060603@uke.de> Hi Tal, you are comparing different things. The "Details"-section of ?anova.qr states "test the hypothesis that smaller models are adequate relative to the largest specified model". So in your first anova you compare fit2 with fit1 and fit0, in your second attempt fit1 with fit0, so you have different "base"-models to compare with, and consequently different p-values. hth. Am 06.10.2011 10:05, schrieb Tal Galili: > Hello dear R help members. > > I am trying to understand the anova.rq, and I am finding something which I > can not explain (is it a bug?!): > > The example is for when we have 3 nested models. I run the anova once on > the two models, and again on the three models. I expect that the p.value > for the comparison of model 1 and model 2 would remain the same, whether or > not I add a third model to be compared with. > However, the P values change, and I do not understand why. > > Here is an example code (following with it's input): > > data(barro) > fit0 <- rq(y.net ~ lgdp2 + fse2 , data = barro) > fit1 <- rq(y.net ~ lgdp2 + fse2 + gedy2 , data = barro) > fit2 <- rq(y.net ~ lgdp2 + fse2 + gedy2 + Iy2 , data = barro) > anova(fit0,fit1,fit2, R = 1000) > anova(fit0,fit1, R = 1000) > > > Output: > >> data(barro) >> fit0 <- rq(y.net ~ lgdp2 + fse2 , data = barro) >> fit1 <- rq(y.net ~ lgdp2 + fse2 + gedy2 , data = barro) >> fit2 <- rq(y.net ~ lgdp2 + fse2 + gedy2 + Iy2 , data = barro) >> anova(fit0,fit1,fit2, R = 1000) > Quantile Regression Analysis of Deviance Table > > Model 1: y.net ~ lgdp2 + fse2 + gedy2 + Iy2 > Model 2: y.net ~ lgdp2 + fse2 + gedy2 > Model 3: y.net ~ lgdp2 + fse2 > Df Resid Df F value Pr(>F) > 1 1 156 29.494 2.110e-07 *** > 2 2 156 18.194 7.901e-08 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> anova(fit0,fit1, R = 1000) > Quantile Regression Analysis of Deviance Table > > Model 1: y.net ~ lgdp2 + fse2 + gedy2 > Model 2: y.net ~ lgdp2 + fse2 > Df Resid Df F value Pr(>F) > 1 1 157 3.9532 0.04852 * > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=Hebrew_Israel.1255 LC_CTYPE=Hebrew_Israel.1255 > [3] LC_MONETARY=Hebrew_Israel.1255 LC_NUMERIC=C > [5] LC_TIME=Hebrew_Israel.1255 > > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] rms_3.3-1 Hmisc_3.8-3 > survival_2.36-9 > [4] colorspace_1.1-0 quantreg_4.71 SparseM_0.89 > > [7] PerformanceAnalytics_1.0.3.2 xts_0.8-2 zoo_1.7-4 > > [10] reporttools_1.0.6 xtable_1.5-6 > > loaded via a namespace (and not attached): > [1] cluster_1.14.0 grid_2.13.1 lattice_0.19-33 tools_2.13.1 > > > > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > www.r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gem?? Gesetz ?ber elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universit?tsklinikum Hamburg-Eppendorf; K?rperschaft des ?ffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden), Dr. Alexander Kirstein, Joachim Pr?l?, Prof. Dr. Dr. Uwe Koch-Gromus From alexandrovich at mathematik.uni-marburg.de Thu Oct 6 11:59:34 2011 From: alexandrovich at mathematik.uni-marburg.de (Grigory Alexandrovich) Date: Thu, 06 Oct 2011 11:59:34 +0200 Subject: [R] Problem with .C In-Reply-To: <4E8B3392.30105@statistik.tu-dortmund.de> References: <4D9AE45D.70205@mathematik.uni-marburg.de> <4E8AF632.1030803@mathematik.uni-marburg.de> <4E8B3392.30105@statistik.tu-dortmund.de> Message-ID: <4E8D7C06.4070704@mathematik.uni-marburg.de> Hello, first thank you for your answers. I did not read the whole pdf Writing R Extension, but I read this strongly shortened introduction to this subject: http://www.math.kit.edu/stoch/~lindner/media/.c.call%20extensions.pdf I get the same error with this C-function: void test(double * b, int l) { int i; for(i=0; i < l ; i++) b[i] +=i; } I call it from R like this: parameter = c(0,0,1,1,1,0,1.5,0.7,0,1.2,0.3); .C("test", as.double(parameter), as.integer(11)) The programm crashes even in this simple case. Where can be the error? Thanks Grigory Alexandrovich Answer 1 > Without knowing that C code, we cannot know. Have you read Writing R Extensions carefully? I.e. take care with memory allocation and printing as mentioned in the manual. > > Uwe Ligges Answer 2 > This looks like a classic case of not reading the manual, and then compounding it by not reading the posting guide. The manual would be the "Writing R Extensions" pdf that comes with R or you can google it. The posting guide is referenced at the bottom of this and every other posting on this mailing list. > There are nearly an infinite variety of errors that can lead to a "crash", so it is really unreasonable of you to pose this question this way and expect constructive assistance. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN: Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. Answer 3 > It's impossible to say, with such minimal information, but a reasonable > guess is that there is a problem with the declaration of "x" and "y" in > foo.c. These would (I think) need to be declared as double *, not double, > when foo is called from .C(). > > cheers, > > Rolf Turner Answer 4 > Hi, > > As other have said, it's very difficult to help you without an example > + code to know what you are talking about. > > That having been said, it seems as if you are just getting your feet > wet in this R <--> C bridge, and I'd recommend you checkout the "Rcpp" > and "inline" package to help make your life a lot easier ... > > -steve > On 04.10.2011 14:04, Grigory Alexandrovich wrote: >> Hello, >> >> I wrote a function in C, which works fine if called from the >> main-function in C. >> >> But as soon as I try to call this function from R like .C('foo', >> as.double(x), as.integer(y)), the programm crashes. >> >> I created a dll with the cmd command R --arch x64 CMD SHLIB foo.c and >> loaded it into R with dyn.load(). >> >> What can be the cause of such behaviour? >> Again, the C-funcion itself works, but not if called from R. >> >> Thanks >> Grigory Alexandrovich >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and prov