From tlumley at uw.edu Sun May 1 00:06:11 2011 From: tlumley at uw.edu (Thomas Lumley) Date: Sun, 1 May 2011 10:06:11 +1200 Subject: [R] help with a survplot In-Reply-To: References: <20110430164400.531dfe9a@caprica> Message-ID: On Sun, May 1, 2011 at 4:49 AM, David Winsemius wrote: > > On Apr 30, 2011, at 10:44 AM, Jabba wrote: > >> Dear useRs, >> >> I was asked to produce a survival curve like this: >> >> http://www.palug.net/Members/jabba/immaginetta.png/view >> >> with the cardinality of the riskset at the bottom. > > The 2nd sentence of help page for survival::survplot says that is an inbuilt > option. > There isn't a survival::survplot. Perhaps you mean rms::survplot? -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland From jim at bitwrit.com.au Sun May 1 12:45:25 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Sun, 01 May 2011 20:45:25 +1000 Subject: [R] plotrix_3.2 Message-ID: <4DBD39C5.1030308@bitwrit.com.au> Hi all, plotrix 3.2 has arrived. The reason for this announcement is that there have been a couple of major rewrites. The barNest family of functions has had an overhaul that began as a fix for an apparently trivial bug that caused problems with empty subcategories. So far, testing has not found anything that will break code written for the former version. color.scale will now convert numeric values into colors in all three color specifications, RGB, HSV and HCL. Again, I think that previous code will run with the new version. The change that will probably cause trouble is the renaming of the color range arguments from redrange, greenrange and bluerange to cs1, cs2 and cs3. The user can trip up the function more easily, as the three specs have different color parameter ranges. If the new version does break anyone's code, especially for packages that depend upon plotrix, please let me know and I'll do my best to resolve any problems. Jim From jim at bitwrit.com.au Sun May 1 14:29:51 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Sun, 01 May 2011 22:29:51 +1000 Subject: [R] using tapply with multiple variables In-Reply-To: References: Message-ID: <4DBD523F.3030609@bitwrit.com.au> On 05/01/2011 05:28 AM, Kevin Burnham wrote: > HI All, > > I have a long data file generated from a minimal pair test that I gave to > learners of Arabic before and after a phonetic training regime. For each of > thirty some subjects there are 800 rows of data, from each of 400 items at > pre and posttest. For each item the subject got correct, there is a 'C' in > the column 'Correct'. The line: > > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) > > gives me the sum of correct answers for each subject. > > However, I would like to have that sum separated by Time (pre or post). Is > there a simple way to do that? > > > What if I further wish to separate by Group (T or C)? > Hi Kevin, When I looked at this, I immediately thought of the brkdnNest function (which uses tapply internally). In order to get the counts with the current function, I had to create a new variable (newcorrect). However, the idea so attracted me that I programmed it into the code (thanks). Here is a way to get your summary by Subject and Time: ALLDATA<-data.frame(Subject=rep(1:30,each=800), Occasion=factor(rep(c("pre","post"),2400),levels=c("pre","post")), Correct=sample(c("C","I"),2400,TRUE)) tapply(ALLDATA$Correct,list(ALLDATA$Subject,ALLDATA$Occasion), function(x) sum(x=="C")) library(plotrix) brkdnNest(Correct~Subject+Occasion,ALLDATA,FUN="propbrk",trueval="C") ALLDATA$newcorrect<-ALLDATA$Correct=="C" brkdnNest(newcorrect~Subject+Occasion,ALLDATA,FUN="sum") To get the three level breakdown, add another factor: ALLDATA$Group<-rep(c("T","C"),each=1200) brkdnNest(newcorrect~Group+Subject+Occasion,ALLDATA,FUN="sum") Notice that this gives you all of the subjects for each Group, even if they weren't in that Group. I'll work on that one, for I have just switched to using "tapply" for this breakdown, as it doesn't discard NA values (the cause of the minor bug in barNest) Jim From jeroenooms at gmail.com Sun May 1 03:36:38 2011 From: jeroenooms at gmail.com (Jeroen Ooms) Date: Sat, 30 Apr 2011 18:36:38 -0700 (PDT) Subject: [R] regressing on variable with heavy tails Message-ID: <1304213798297-3486973.post@n4.nabble.com> I have a dependent variable with is very peaked and has heavy tails, something I haven't encountered before. (histogram: http://postimage.org/image/2sw9bn8pw/). What could be an appropriate family or transformation to do regress on this?-- View this message in context: http://r.789695.n4.nabble.com/regressing-on-variable-with-heavy-tails-tp3486973p3486973.html Sent from the R help mailing list archive at Nabble.com. From aabroadh at ncsu.edu Sun May 1 01:18:50 2011 From: aabroadh at ncsu.edu (Alice Wines) Date: Sat, 30 Apr 2011 19:18:50 -0400 Subject: [R] indexing into a data.frame using another data.frame that also contains values for replacement Message-ID: Hello all, I have a quandry I have been scratching my head about for a while. I've searched the manual and the web and have not been able to find an acceptable result, so I am hoping for some help. I have two data frames and I want to index into the first using the second, and replace the specific values I have indexed with more values from the second data.frame. I can do this using a loop, but I wanted a quicker solution with no loops involved. Although my data set is much larger than this, a small example of what I am trying to do is as follows: df1 <- data.frame(rows=c("A","B","C", "B", "C", "A"), columns=c("21_2", "22_2", "23_2", "21_2", "22_2", "23_2"), values=c(3.3, 2.5, 67.2, 44.3, 53, 66)) df2 <- data.frame(matrix(rep(NA, length(df1$values)),nrow=3, ncol=3)) names(df2) <- c("21_2", "22_2", "23_2") row.names(df2) <- c("A", "B", "C") > df1 rows columns values 1 A 21_2 3.3 2 B 22_2 2.5 3 C 23_2 67.2 4 B 21_2 44.3 5 C 22_2 53.0 6 A 23_2 66.0 > df2 21_2 22_2 23_2 A NA NA NA B NA NA NA C NA NA NA Note that none of the same locations in df2 are specified twice in df2, so I'm not worried about over-writing it. I have tried 'mapply' and 'replace', but apparently either they do not work well for this or I don't understand how to use them properly for this purpose. My understanding is that 'replace' needs a vector input and that one cannot create a vector of vectors, so I couldn't pass my indices to 'replace'. When I tried mapply, the code I used was something like what follows: df3 <- mapply('[<-' , df2, paste(as.character(df1$rows), as.character(df1$columns), sep=', '), df1$values) but it yields the following strange result > df3 21_2 22_2 23_2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA A, 21_2 3.3 2.5 67.2 44.3 53 66 What I want to see is the following: >df3 21_2 22_2 23_2 A 3.3 NA 66.0 B 44.3 2.5 NA C NA 53.0 67.2 I will greatly appreciate any help that can be given as I am completely bamboozled by this problem and although I found many useful things in my search for an answer, I did not find out how to do this. Thanks, Alice From wjlee2002 at naver.com Sun May 1 04:50:21 2011 From: wjlee2002 at naver.com (Wonjae Lee) Date: Sat, 30 Apr 2011 19:50:21 -0700 (PDT) Subject: [R] Conversion to xlsx file Message-ID: <1304218221827-3487118.post@n4.nabble.com> Hi, all I would like to convert xls files to xlsx files with R commands in R console instead of saving xls files as xlsx files after opening xls files. Please show me how. Thanks. Wonjae -- View this message in context: http://r.789695.n4.nabble.com/Conversion-to-xlsx-file-tp3487118p3487118.html Sent from the R help mailing list archive at Nabble.com. From dwinsemius at comcast.net Sun May 1 05:15:49 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sat, 30 Apr 2011 23:15:49 -0400 Subject: [R] regressing on variable with heavy tails In-Reply-To: <1304213798297-3486973.post@n4.nabble.com> References: <1304213798297-3486973.post@n4.nabble.com> Message-ID: <6BCE37AF-632A-4A41-8116-DDBEC2C79A9C@comcast.net> On Apr 30, 2011, at 9:36 PM, Jeroen Ooms wrote: > I have a dependent variable with is very peaked and has heavy tails, > something I haven't encountered before. (histogram: > http://postimage.org/image/2sw9bn8pw/). What could be an appropriate > family > or transformation to do regress on this? None. You have not yet established that the residuals of your analysis violate any of linear regression assumptions. It is the residuals that need to be Normal, not the dependent variable. -- David Winsemius, MD West Hartford, CT From xie at yihui.name Sun May 1 05:35:40 2011 From: xie at yihui.name (Yihui Xie) Date: Sat, 30 Apr 2011 22:35:40 -0500 Subject: [R] Plotting an Underbrace in R In-Reply-To: References: Message-ID: Oh, I did not see this post and I just saw your message in my blog. Anyway, here is a solution for other people's future reference: http://yihui.name/en/2011/04/produce-authentic-math-formulas-in-r-graphics/ Regards, Yihui -- Yihui Xie Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Fri, Apr 15, 2011 at 4:42 AM, Michael McAssey wrote: > Baptiste, > > Thank you. ?The examples in the documentation for tikz helped me solve > this problem, and give me a good tool for future plots. ?I only need > to figure out how to incorporate LaTeX packages like amssymb so that I > can use \mathbb{} to put the real numbers symbol in my plot. > > Regards, > > Michael > > On Fri, Apr 15, 2011 at 10:48 AM, baptiste auguie > wrote: >> Hi, >> >> Through pgfSweave you can use the tikz device, which is the one that >> can interpret Latex code (package tikzDevice). I would start with a >> minimal self-contained plot with this function. see ?tikz for >> examples. >> >> HTH, >> >> baptiste >> >> >> >> >> On 15 April 2011 20:03, Michael McAssey wrote: >>> Ben, >>> >>> This example of pgfSweave looks like it would address my problem.? I >>> installed this package, but I cannot determine how to use it to make R >>> convert LaTeX commands into mathematical symbols on an R plot.? The >>> documentation does not seem to address this.? I tried to mimic the >>> example from Yihui Xie that you provided but R only puts the LaTeX >>> code on my plot without converting it.? That is, if in the R GUI I >>> have: >>> >>>> library(pgfSweave) >>>> plot(1:10, 1:10, "$Y=\\beta_0 + \\beta_1 x + \\epsilon$") >>> >>> then in the plot the y-axis label is $Y=\beta_0 + \beta_1 x + >>> \epsilon$ rather than what I want. ?I can't find any useful help on >>> Google. ?Any further suggestions? >>> >>> Thanks. >>> >>> Michael >>> >>> >>> On Thu, Apr 14, 2011 at 11:22 PM, Ben Bolker wrote: >>>> >>>> Michael McAssey gmail.com> writes: >>>> >>>> > >>>> > I need to include some mathematical expressions in a plot I am creating in >>>> > R, one of which requires an underbrace, which in LaTeX would be written like >>>> > >>>> > \underbrace{T \cdots T}_{n times} >>>> > >>>> > There does not appear to be a provision for this in plotmath, and I cannot >>>> > find anything on the topic in the R-help archive or in a Google search. ?I >>>> > would appreciate some help with this. >>>> >>>> ?Does >>>> >>>> >>>> >>>> ?help at all? >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> Michael P. McAssey, Ph.D. >>> Statistics for Life Sciences, Department of Mathematics >>> Faculty of Sciences, Vrije Universiteit Amsterdam >>> De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands >>> Office: S-230, W&N building >>> Tel: +31 (0)20 598 7724 >>> Fax: +31 (0)20 598 7653 >>> Mobile: +31 (0)62 113 8600 >>> http://www.few.vu.nl/~mmy700/ >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > > > -- > Michael P. McAssey, Ph.D. > Statistics for Life Sciences, Department of Mathematics > Faculty of Sciences, Vrije Universiteit Amsterdam > De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands > Office: S-230, W&N building > Tel: +31 (0)20 598 7724 > Fax: +31 (0)20 598 7653 > Mobile: +31 (0)62 113 8600 > http://www.few.vu.nl/~mmy700/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From chee.chen at yahoo.com Sun May 1 05:39:43 2011 From: chee.chen at yahoo.com (Chee Chen) Date: Sat, 30 Apr 2011 23:39:43 -0400 Subject: [R] Question on where samples are grouped in rmvnorm{mvtnorm} Message-ID: <7976B75F085949799C0B54770DCEC4A5@XbiT> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From SPhillips at Lexington1.net Sun May 1 06:33:41 2011 From: SPhillips at Lexington1.net (Shane Phillips) Date: Sun, 1 May 2011 00:33:41 -0400 Subject: [R] Simulation Questions Message-ID: I have the following script for generating a dataset. It works like a champ except for a couple of things. 1. I need the variables "itbs" and "map" to be negatively correlated with the binomial variable "lunch" (around -0.21 and -0.24, respectively). The binomial variable "lunch" needs to remain unchanged. 2. While my generated variables do come out with the desired means and correlations, the distribution is very narrow and only represents a small portion of the possible scores. Can I force it to encompass a wider range of scores, while maintaining my desired parameters and correlations? Please help... Shane Script follows... #Number the subjects subject=1:1000 #Assign a treatment condition from a binomial distribution with a probability of 0.13 treat=rbinom(1*1000,1,.13) #Assign a lunch status condition froma binomial distribution with a probability of 0.35 lunch=rbinom(1*1000,1,.35) #Generate age in months from a random normal distribution with mean of 87 and sd of 2 age=rnorm(1000,87,2) #invoke the MASS package require(MASS) #Establish the covariance matrix for MAP, ITBS and CogAT scores sigma <- matrix(c(1, 0.84, 0.59, 0.84, 1, 0.56, 0.59, 0.56, 1), ncol = 3) #Establish MAP as a random normal variable with mean of 200 and sd of 9 map <- rnorm(1000, 200, 9) #Establish ITBS as a random normal variable with mean of 175 and sd of 15 itbs <- rnorm(1000, 175, 15) #Establish CogAT as a random normal variable with mean of 100 and sd of 16 cogat<-rnorm(1000,100,16) #Create a dataframe of MAP, ITBS, and CogAT data <- data.frame(map, itbs, cogat) #Draw from the multivariate distribution defined by MAP, ITBS, and CogAT means and the covariance matrix sim <- mvrnorm(1000, mu=mean(data), sigma, empirical=FALSE) #Set growth at 0 growth=0 #Combine elements into a single dataset simtest=data.frame (subject=subject, treat=treat,lunch, age=round(age,0),round(sim,0),growth) #Set mean growth by treatment condition with treatd subjects having a mean growth of 1.5 and non-treated having a mean growth of 0.1 simtest<-transform(simtest, growth=rnorm(1000,m=ifelse(treat==0,0.1,1.5),s=1)) simtest cor (simtest) From bhh at xs4all.nl Sun May 1 06:50:29 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Sat, 30 Apr 2011 21:50:29 -0700 (PDT) Subject: [R] indexing into a data.frame using another data.frame that also contains values for replacement In-Reply-To: References: Message-ID: <1304225429735-3487234.post@n4.nabble.com> Alice Wines wrote: > > Hello all, > > I have a quandry I have been scratching my head about for a > while. I've searched the manual and the web and have not been able to > find an acceptable result, so I am hoping for some help. > > I have two data frames and I want to index into the first using > the second, and replace the specific values I have indexed with more > values from the second data.frame. I can do this using a loop, but I > wanted a quicker solution with no loops involved. > > Although my data set is much larger than this, a small example of what > I am trying to do is as follows: > > df1 <- data.frame(rows=c("A","B","C", "B", "C", "A"), > columns=c("21_2", "22_2", "23_2", "21_2", "22_2", "23_2"), > values=c(3.3, 2.5, 67.2, 44.3, 53, 66)) > df2 <- data.frame(matrix(rep(NA, length(df1$values)),nrow=3, ncol=3)) > names(df2) <- c("21_2", "22_2", "23_2") > row.names(df2) <- c("A", "B", "C") > >> df1 > rows columns values > 1 A 21_2 3.3 > 2 B 22_2 2.5 > 3 C 23_2 67.2 > 4 B 21_2 44.3 > 5 C 22_2 53.0 > 6 A 23_2 66.0 > > ....... > > What I want to see is the following: > >>df3 > 21_2 22_2 23_2 > A 3.3 NA 66.0 > B 44.3 2.5 NA > C NA 53.0 67.2 > How about this (I have converted df2 into a matrix) df2 <- matrix(rep(NA, length(df1$values)),nrow=3, ncol=3) colnames(df2) <- c("21_2", "22_2", "23_2") rownames(df2) <- c("A", "B", "C") df2[cbind(df1$rows,df1$columns)] <- df1$values # convert df2 to a data.frame Berend-- View this message in context: http://r.789695.n4.nabble.com/indexing-into-a-data-frame-using-another-data-frame-that-also-contains-values-for-replacement-tp3487147p3487234.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Sun May 1 07:03:24 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sat, 30 Apr 2011 22:03:24 -0700 Subject: [R] using tapply with multiple variables In-Reply-To: References: Message-ID: Hi: If you have R 2.11.x or later, one can use the formula version of aggregate(): aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x) sum(x == 'C')) A variety of contributed packages (plyr, data.table, doBy, sqldf and remix, among others) have similar capabilities. If you want some additional summaries (e.g., percent correct), here is an example function for a single subject/group that aggregate() can use to propagate to all subgroups and subjects (I encourage you to play with it): f <- function(x) { Correct <- sum(x == 'C') Percent <- round(100 * Correct/length(x), 3) c(Number = Correct, Percent = Percent) } aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f) The particular function isn't as important as knowing you can do this sort of thing. Several of the contributed packages indicated above have similar, if not superior, capabilities, depending on the situation. Toy example to test the above: dd <- data.frame(Subject = rep(1:5, each = 100), Group = rep(rep(c('C', 'T'), each = 50), 5), Correct = factor(rbinom(500, 1, 0.8), labels = c('I', 'C'))) aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x == 'C')) Subject Group Correct 1 1 C 40 2 2 C 36 3 3 C 39 4 4 C 37 5 5 C 41 6 1 T 43 7 2 T 45 8 3 T 37 9 4 T 45 10 5 T 36 aggregate(Correct ~ Subject + Group, data = dd, FUN = f) Subject Group Correct.Number Correct.Percent 1 1 C 40 80 2 2 C 36 72 3 3 C 39 78 4 4 C 37 74 5 5 C 41 82 6 1 T 43 86 7 2 T 45 90 8 3 T 37 74 9 4 T 45 90 10 5 T 36 72 HTH, Dennis On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham wrote: > HI All, > > I have a long data file generated from a minimal pair test that I gave to > learners of Arabic before and after a phonetic training regime. ?For each of > thirty some subjects there are 800 rows of data, from each of 400 items at > pre and posttest. ?For each item the subject got correct, there is a 'C' in > the column 'Correct'. ?The line: > > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) > > gives me the sum of correct answers for each subject. > > However, I would like to have that sum separated by Time (pre or post). ?Is > there a simple way to do that? > > > What if I further wish to separate by Group (T or C)? > > Thanks, > Kevin > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Sun May 1 07:09:34 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sat, 30 Apr 2011 22:09:34 -0700 Subject: [R] indexing into a data.frame using another data.frame that also contains values for replacement In-Reply-To: References: Message-ID: Hi: Here are two possibilities: df1 <- data.frame(rows=c("A","B","C", "B", "C", "A"), columns=c("21_2", "22_2", "23_2", "21_2", "22_2", "23_2"), values=c(3.3, 2.5, 67.2, 44.3, 53, 66)) with(df1, xtabs(values ~ rows + columns)) columns rows 21_2 22_2 23_2 A 3.3 0.0 66.0 B 44.3 2.5 0.0 C 0.0 53.0 67.2 library(reshape2) dcast(df1, rows ~ columns) Using values as value column: use value_var to override. rows 21_2 22_2 23_2 1 A 3.3 NA 66.0 2 B 44.3 2.5 NA 3 C NA 53.0 67.2 HTH, Dennis On Sat, Apr 30, 2011 at 4:18 PM, Alice Wines wrote: > Hello all, > > ? ? I have a quandry I have been scratching my head about for a > while. I've searched the manual and the web and have not been able to > find an acceptable result, so I am hoping for some help. > > ? ? I have two data frames and I want to index into the first using > the second, and replace the specific values I have indexed with more > values from the second data.frame. I can do this using a loop, but I > wanted a quicker solution with no loops involved. > > Although my data set is much larger than this, a small example of what > I am trying to do is as follows: > > df1 <- data.frame(rows=c("A","B","C", "B", "C", "A"), > columns=c("21_2", "22_2", "23_2", "21_2", "22_2", "23_2"), > values=c(3.3, 2.5, 67.2, 44.3, 53, 66)) > df2 <- data.frame(matrix(rep(NA, length(df1$values)),nrow=3, ncol=3)) > names(df2) <- c("21_2", "22_2", "23_2") > row.names(df2) <- c("A", "B", "C") > >> df1 > ?rows columns values > 1 ? ?A ? ?21_2 ? ?3.3 > 2 ? ?B ? ?22_2 ? ?2.5 > 3 ? ?C ? ?23_2 ? 67.2 > 4 ? ?B ? ?21_2 ? 44.3 > 5 ? ?C ? ?22_2 ? 53.0 > 6 ? ?A ? ?23_2 ? 66.0 > > > >> df2 > ?21_2 22_2 23_2 > A ? NA ? NA ? NA > B ? NA ? NA ? NA > C ? NA ? NA ? NA > > > ? ? Note that none of the same locations in df2 are specified twice > in df2, so I'm not worried about over-writing it. > > ? ?I have tried 'mapply' and 'replace', but apparently either they do > not work well for this or I don't understand how to use them properly > for this purpose. My understanding is that 'replace' needs a vector > input and that one cannot create a vector of vectors, so I couldn't > pass my indices to 'replace'. > > ? ? When I tried mapply, the code I used was something like what follows: > > df3 <- mapply('[<-' , df2, paste(as.character(df1$rows), > as.character(df1$columns), sep=', '), df1$values) > > but it yields the following strange result > >> df3 > ? ? ? ?21_2 22_2 23_2 > ? ? ? ? ?NA ? NA ? NA ? NA ? NA ? NA > ? ? ? ? ?NA ? NA ? NA ? NA ? NA ? NA > ? ? ? ? ?NA ? NA ? NA ? NA ? NA ? NA > A, 21_2 ?3.3 ?2.5 67.2 44.3 ? 53 ? 66 > > > What I want to see is the following: > >>df3 > ? ?21_2 ? 22_2 ?23_2 > A ? 3.3 ? ?NA ? ?66.0 > B ? 44.3 ?2.5 ? ?NA > C ? NA ? 53.0 ? 67.2 > > > ? ? I will greatly appreciate any help that can be given as I am > completely bamboozled by this problem and although I found many useful > things in my search for an answer, I did not find out how to do this. > > Thanks, > > Alice > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From azamjaafari at yahoo.com Sun May 1 09:08:59 2011 From: azamjaafari at yahoo.com (azam jaafari) Date: Sun, 1 May 2011 00:08:59 -0700 (PDT) Subject: [R] vector file Message-ID: <275397.48054.qm@web37105.mail.mud.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From b.rowlingson at lancaster.ac.uk Sun May 1 09:50:05 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Sun, 1 May 2011 08:50:05 +0100 Subject: [R] vector file In-Reply-To: <275397.48054.qm@web37105.mail.mud.yahoo.com> References: <275397.48054.qm@web37105.mail.mud.yahoo.com> Message-ID: On Sun, May 1, 2011 at 8:08 AM, azam jaafari wrote: > Dear All > > I want to import the vector file (?? .shp) to R. I could import the file by rgdal package before, by following: > > geology<-readOGR('C:/geology//saga/geo.geom','finalgeology') > > but now there is an error: > > Error in ogrInfo(dsn = dsn, layer = layer, input_field_name_encoding = input_field_name_encoding) : > > ??????? GDAL Error 4: .shx file is unreadable, or corrupt. > > Can you tell me where is the problem. I would say that the .shx file is unreadable, or corrupt. Do you even have a .shx file? For every .shp file, there should be a .shx file, and a .dbf file. .shp files on their own are useless - you will need at least finalgeology.shp, finalgeology.shx and finalgeology.dbf. There may also be a finalgeology.prj with the projection info. We don't know what's changed between when you said it worked and now, so we can't really tell. The only clue is the error about the .shx file - so check you still have the .shx file and you didnt delete it. Barry From dwinsemius at comcast.net Sun May 1 10:38:41 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 1 May 2011 04:38:41 -0400 Subject: [R] help with a survplot In-Reply-To: References: <20110430164400.531dfe9a@caprica> Message-ID: <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> On Apr 30, 2011, at 6:06 PM, Thomas Lumley wrote: > On Sun, May 1, 2011 at 4:49 AM, David Winsemius > wrote: >> >> On Apr 30, 2011, at 10:44 AM, Jabba wrote: >> >>> Dear useRs, >>> >>> I was asked to produce a survival curve like this: >>> >>> http://www.palug.net/Members/jabba/immaginetta.png/view >>> >>> with the cardinality of the riskset at the bottom. >> >> The 2nd sentence of help page for survival::survplot says that is >> an inbuilt >> option. >> > > There isn't a survival::survplot. Perhaps you mean rms::survplot? Apologies. I was confused about which package it was in. I generally load survival by require()-ing rms, and thought (without looking at what was plainly to see) was from survival. > > -thomas > > -- > Thomas Lumley > Professor of Biostatistics > University of Auckland David Winsemius, MD West Hartford, CT From mailzhuyao at gmail.com Sun May 1 11:01:54 2011 From: mailzhuyao at gmail.com (zhu yao) Date: Sun, 1 May 2011 17:01:54 +0800 Subject: [R] Different results of coefficients by packages penalized and glmnet Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From savicky at praha1.ff.cuni.cz Sun May 1 12:10:13 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Sun, 1 May 2011 12:10:13 +0200 Subject: [R] Question on where samples are grouped in rmvnorm{mvtnorm} In-Reply-To: <7976B75F085949799C0B54770DCEC4A5@XbiT> References: <7976B75F085949799C0B54770DCEC4A5@XbiT> Message-ID: <20110501101013.GA8999@praha1.ff.cuni.cz> On Sat, Apr 30, 2011 at 11:39:43PM -0400, Chee Chen wrote: > Dear All, > For function: rmvnorm{mvtnorm} in (library mvtnorm, not splus2R), if I generate 2 bivariate normal samples as follows: > > rmvnorm(2,mean=rep(0,2),sigma=diag(2)) > [,1] [,2] > [1,] 2.0749459 1.4932752 > [2,] -0.9886333 0.3832266 > > Where is the first sample, it is stored in the first row or the first column? > Does this function store samples row-wise or column-wise? Hi. The call rmvnorm(3,mean=rep(0,2),sigma=diag(2)) produces [,1] [,2] [1,] 2.5795462 0.7862570 [2,] 0.2630356 0.3879757 [3,] -0.5035942 0.5804228 which shows that the samples are rows. The arrangement that rows are the vectors of observations and columns are variables is typical in R. The examples in ?rmvnorm demonstrate this, since the mean is computed using colMeans() and variance using var(). See also ?var. Hope this helps. Petr Savicky. From wildscop at hotmail.com Sun May 1 09:34:08 2011 From: wildscop at hotmail.com (Ehsan Karim) Date: Sun, 1 May 2011 00:34:08 -0700 Subject: [R] Longitudinal data with non-randomized subjects Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From matevz.pavlic at gi-zrmk.si Sun May 1 13:08:16 2011 From: matevz.pavlic at gi-zrmk.si (=?UTF-8?B?TWF0ZXbFviBQYXZsacSN?=) Date: Sun, 1 May 2011 13:08:16 +0200 Subject: [R] QQ plot for normality testing In-Reply-To: References: Message-ID: Thanks for the answer and for the link. I was lookin for a search trough the forum posts.... So the slope of the line is not important as long as the data is approx. on the line? Thanks, m -----Original Message----- From: Joshua Wiley [mailto:jwiley.psych at gmail.com] Sent: Saturday, April 30, 2011 8:04 PM To: Matev? Pavli? Cc: r-help at r-project.org Subject: Re: [R] QQ plot for normality testing Hi, qqnorm basically plots your actual sample values against what the values would be (approximately) if they were from a normal distribution. qqline() adds a line through the 1st and 3rd quartiles. So roughly speaking, if your QQ plot forms a straight line (particularly the one drawn by qqline), then your sample values match a normal distribution. With real data, it is typically not a "yes/no" decision, rather "is my data normal enough?" Questions like this have been asked many times on this list, so searching the mailing list archives will lead you to many more discussions and suggestions. Here is one way to search: http://tolstoy.newcastle.edu.au/R/ Cheers, Josh On Sat, Apr 30, 2011 at 10:27 AM, Matev? Pavli? wrote: > Hi all, > > > > I am trying to test wheater the distribution of my samples is normal with QQ plot. > > > > I have a values of water content in clays in around few hundred samples. Is the code : > > > > qqnorm(w) ? ? ?#w being water content > > qqline(w) > > > > > > sufficient? > > > > How do I know when I get the plots which distribution is normal and which is not? > > > > Thanks, m > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From murdoch.duncan at gmail.com Sun May 1 13:38:21 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Sun, 01 May 2011 07:38:21 -0400 Subject: [R] create namespace without creating a package? In-Reply-To: <201104302225407402915@gmail.com> References: <201104302225407402915@gmail.com> Message-ID: <4DBD462D.5040207@gmail.com> On 30/04/11 10:26 AM, xiagao1982 wrote: > Hi all, > > I am a C++/C# programmer who is new to R. I would like to use something like "namespace" to organize my functions without creating a package. How can I do this? Thanks! You could do it with the local() function or other explicit use of environments, but it's a bad idea. Use a package: that's the R way to do it. Duncan Murdoch From murdoch.duncan at gmail.com Sun May 1 13:42:23 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Sun, 1 May 2011 07:42:23 -0400 Subject: [R] Sys.getenv at startup is not working properly In-Reply-To: References: Message-ID: <4DBD471F.1030507@gmail.com> Please upgrade to a current release (or R-patched). Version 2.10.1 is quite old. Duncan Murdoch On 30/04/11 3:18 PM, Oliver wrote: > Hello, > > when using > > Sys.getenv() during startup-phase (.First or .Rprofile) > to get the env-variables > COLUMNS as well as HOST I get empty strings. > > After the startup is done, when asking via Sys.getenv() > by hand, COLUMNS is set (but HOST is not, even "hostname" on the shell gives me > a correct answer). > > At the moment my problem is the missing COLUMNS value during start up, > because I want to set the linewidth for printing via > > options(width=Sys.getenv("COLUMNS")) > > automatically at startup. > > When using > $ R CMD BATCH myscript.R > the same problem occurs, but then at least I can understand > the case (but interpreting "" as "0" by the options() would be better, > because the option-setting would then not break the script; it does break the > script, when COLUMNS is ""). > > (btw: Is there a possibility to decide if the script is running in batch mode or > interactively? This could be a workaround for "" not interpreted as "0".) > > The setting with the options/Sys.getenv() works, when typed in by hand > after startup is completed, as well as when sourcing-in a script that > contains such a options/Sys.getenv-command. > > > here is, what R.version contains: > ============================================== > _ > platform x86_64-pc-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 2 > minor 10.1 > year 2009 > month 12 > day 14 > svn rev 50720 > language R > version.string R version 2.10.1 (2009-12-14) > ============================================== > > Is this problem fixed in newer releases? > Or if not: how can I inform the R developers, so that they can > pick it up? > (Some R developers might be on this list?!) > > Ciao, > Oliver > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ggrothendieck at gmail.com Sun May 1 15:15:24 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Sun, 1 May 2011 09:15:24 -0400 Subject: [R] create namespace without creating a package? In-Reply-To: <201104302225407402915@gmail.com> References: <201104302225407402915@gmail.com> Message-ID: On Sat, Apr 30, 2011 at 10:26 AM, xiagao1982 wrote: > Hi all, > > I am a C++/C# programmer who is new to R. I would like to use something like "namespace" to organize my functions without creating a package. How can I do this? Thanks! > You can arrange them in classes using reference classes or the R.oo package or you could arrange them in proto objects. for the last one see the section on traits in the proto package. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From ligges at statistik.tu-dortmund.de Sun May 1 17:52:40 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 01 May 2011 17:52:40 +0200 Subject: [R] plot several histograms with same y-axes scaling using hist() In-Reply-To: <4DBAB195.3060503@bitwrit.com.au> References: <1304073341114-3483376.post@n4.nabble.com> <4DBAB195.3060503@bitwrit.com.au> Message-ID: <4DBD81C8.3070604@statistik.tu-dortmund.de> On 29.04.2011 14:39, Jim Lemon wrote: > On 04/29/2011 08:35 PM, hck wrote: >> Dear all >> >> Problem: hist()-function, scale = ?percent? >> >> I want to generate histograms for changing underlying data. In order >> to make >> them comparable, I want to fix the y-axis (vertical-axis) to, e.g., >> 0%, 10%, >> 20%, 30% as well as to fix the spaces, too. So the y-axis in each >> histogram >> should be identical. Currently, I have 100 histograms and the y-axis >> scales >> changes in each. >> >> Here is my code: >> >> ="Hist(na.exclude("&AA3&"), breaks=50, col=""seashell3"", >> scale=""percent"",xlim=c(-1, 1), xlab=""Bewertungsfehler"", >> ylab=""Haeufigkeit (in %)"", main=""KBV"", border=""white"")" >> >> I tried the ylim=c(?), but unfortunately it does not work. > > Hi Hans, > The "barp" function in plotrix can plot histograms (see the last example > on the help page) and may be flexible enough to do what you want.# or just use a fixed ylim? Uwe > Jim > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sun May 1 18:01:04 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 01 May 2011 18:01:04 +0200 Subject: [R] Question on where samples are grouped in rmvnorm{mvtnorm} In-Reply-To: <7976B75F085949799C0B54770DCEC4A5@XbiT> References: <7976B75F085949799C0B54770DCEC4A5@XbiT> Message-ID: <4DBD83C0.3040308@statistik.tu-dortmund.de> On 01.05.2011 05:39, Chee Chen wrote: > Dear All, > For function: rmvnorm{mvtnorm} in (library No, it is a package, not a library! > mvtnorm, not splus2R), if I generate 2 bivariate normal samples as follows: >> rmvnorm(2,mean=rep(0,2),sigma=diag(2)) > [,1] [,2] > [1,] 2.0749459 1.4932752 > [2,] -0.9886333 0.3832266 > > Where is the first sample, it is stored in the first row or the first column? > Does this function store samples row-wise or column-wise? Hmmm, you could do something to find out which is much faster than writing this message: Try to generate 3 bivariate samples and look if you got 3 rows or 3 columns now! Uwe Ligges > Thank you, > -Chee > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sun May 1 18:02:50 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 01 May 2011 18:02:50 +0200 Subject: [R] Speed up code with for() loop In-Reply-To: <1304108415156-3484548.post@n4.nabble.com> References: <1304011655058-3481680.post@n4.nabble.com> <1304108415156-3484548.post@n4.nabble.com> Message-ID: <4DBD842A.8040909@statistik.tu-dortmund.de> On 29.04.2011 22:20, hck wrote: > Barth sent me a very good code and I modified it a bit. Have a look: > > Error<-rnorm(10000000, mean=0, sd=0.05) > estimate<-(log(1+0.10)+Error) > > DCF_korrigiert<-(1/(exp(1/(exp(0.5*(-estimate)^2/(0.05^2))*sqrt(2*pi/(0.05^2 > ))*(1-pnorm(0,((-estimate)/(0.05^2)),sqrt(1/(0.05^2))))))-1)) > DCF_verzerrt<-(1/(exp(estimate)-1)) > > S<- 10000000 # total sample size > D<- 10000 # number of subsamples > Subset<- 10000 # number in each subsample > Select<- matrix(sample(S,D*Subset,replace=TRUE),nrow=Subset,ncol=D) > > DCF_korrigiert_select<- matrix(DCF_korrigiert[Select],nrow=Subset,ncol=D) > Delta_ln<-(log(colMeans(DCF_korrigiert_select, na.rm=T)/(1/0.10))) > > > > The only problem I discovered is that R cannot handle more than > 2.147.483.647 integers, thus the cells in the matrix are bounded by this > condition. (R shows the max by typing: .Machine$integer.max). And if you > want to safe the workspace, the file with 10.000 times 10.000 becomes round > 2 GB. Compared to the original of "just" 300 MB. > > So I cannot perform my previous bootstrap with 1.000.000 times 100.000. But > nevertheless 10.000 times 10.000 seems to be sufficiently; I have to say its > amazing, how fast the idea works. > > Has anybody a suggestion how to make it work for the 1.000.000 times 100.000 > bootstrap??? Run it in several blocks of matrices with appropriate dimensions? This allows easy parallelization as well. Uwe Ligges > > > -- > View this message in context: http://r.789695.n4.nabble.com/Speed-up-code-with-for-loop-tp3481680p3484548.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sun May 1 18:05:33 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 01 May 2011 18:05:33 +0200 Subject: [R] Problems downloading statmod cran package In-Reply-To: References: Message-ID: <4DBD84CD.7050404@statistik.tu-dortmund.de> Hmmm, your subject says you are going to get statmod, On 28.04.2011 04:31, Jim Silverton wrote: > Hello all, > I keep on getting the following error message when I try downloading > statmod: > >> install.packages("statmod") > Installing package(s) into ?C:\Users\Isaac\Documents/R/win-library/2.12? > (as ?lib? is unspecified) > trying URL ' > http://www.revolution-computing.com/cran/bin/windows/contrib/2.12/statmod_1.4.9.zip > ' > Error in download.file(url, destfile, method, mode = "wb", ...) : > cannot open URL ' > http://www.revolution-computing.com/cran/bin/windows/contrib/2.12/statmod_1.4.9.zip > ' > In addition: Warning message: > In download.file(url, destfile, method, mode = "wb", ...) : > cannot open: HTTP status was '404 Not Found' > Warning in download.packages(pkgs, destdir = tmpd, available = available, : > download of package 'statmod' failed >> > > Does anyone has a solution? Yes: Try again. I think you managed to try this while the mirror you used was updated (and statmod 1.4.10 was already on the mirror while the PACKAGES file was not updated and told your R that 1.4.9 was recent). Uwe Ligges > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sun May 1 18:11:09 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 01 May 2011 18:11:09 +0200 Subject: [R] Element by Element addition of the columns of a Matrix In-Reply-To: <1304082379234-3483628.post@n4.nabble.com> References: <1304079713131-3483545.post@n4.nabble.com> <1304082379234-3483628.post@n4.nabble.com> Message-ID: <4DBD861D.7080803@statistik.tu-dortmund.de> On 29.04.2011 15:06, Pete Brecknock wrote: > ... is the apply function what you are looking for? > > A=matrix(1,2,4) > > apply(A,1,sum) Thanks for providing answers to R-help, but: 1. Please quote the original question for the mailing list readers of us. 2. Please reply also to the original poster who may not be subscribed to the list. 3. In order to do the above more efficient, you actually want to use rowSums(A) rather than the apply() version given above. Best, Uwe Ligges > HTH > > Pete > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Element-by-Element-addition-of-the-columns-of-a-Matrix-tp3483545p3483628.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From joelr1 at gmail.com Sun May 1 15:33:28 2011 From: joelr1 at gmail.com (Joel Reymont) Date: Sun, 1 May 2011 14:33:28 +0100 Subject: [R] microsecond timestamp support Message-ID: <71BB248B-ECF8-4E46-9FCD-050401E860B4@gmail.com> Does R have support for microseconds in timestamps, e.g. when reading this in "Time","Include","Kind","Duration" 2011-04-01 14:20:36.368324,Y,U,1.03238296509 2011-04-01 14:20:35.342732,Y,C,0.0252721309662 2011-04-01 14:20:34.337209,Y,R,0.00522899627686 Thanks, Joel -------------------------------------------------------------------------- - for hire: mac osx device driver ninja, kernel extensions and usb drivers ---------------------+------------+--------------------------------------- http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont ---------------------+------------+--------------------------------------- From rajreni.kaul at gmail.com Sun May 1 16:52:05 2011 From: rajreni.kaul at gmail.com (Ren) Date: Sun, 1 May 2011 07:52:05 -0700 (PDT) Subject: [R] Dummy variables using rfe in caret for variable selection Message-ID: <1304261525517-3487861.post@n4.nabble.com> I'm trying to run "rfe" for variable selection in the caret package, and am getting an error. My data frame includes a dummy variable with 3 levels. x <- chlDescr y <- chl #crate dummy variable levels(x$State) <- c("AL","GA","FL") dummy <- model.matrix(~State,x) z <- cbind(dummy, x) #remove State category variable w <- z[,c(-4)] subsets <- c(2:8) ctrl<- rfeControl(functions = lmFuncs, method="cv", verbose=FALSE, returnResamp = "final") lmProfile <- rfe(w, y, sizes = subsets, rfeControl = ctrl) Returns: Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected In addition: Warning message: In predict.lm(object, x) : prediction from a rank-deficient fit may be misleading When I remove the dummy variables the function runs fine. #remove State variable Desc <- chlDescr[,-c(1)] lmProfile <- rfe(Desc, y, sizes = subsets, rfeControl = ctrl) Returns: Recursive feature selection Outer resamping method was 10 iterations of cross-validation. Resampling performance over subset size: Variables RMSE Rsquared RMSESD RsquaredSD Selected 1 0.2462 0.7454 0.09529 0.17362 2 0.2408 0.7680 0.07860 0.12543 3 0.2134 0.8285 0.06649 0.09043 4 0.2011 0.8609 0.03463 0.05928 * 5 0.2019 0.8622 0.03421 0.05675 6 0.2019 0.8622 0.03421 0.05675 Can lmFuncs handle dummy variables? How does it need to be modified so it can? I'm new at this so any help would be appreciated, thanks. Reni http://r.789695.n4.nabble.com/file/n3487861/chl.csv chl.csv http://r.789695.n4.nabble.com/file/n3487861/chlDescr.csv chlDescr.csv -- View this message in context: http://r.789695.n4.nabble.com/Dummy-variables-using-rfe-in-caret-for-variable-selection-tp3487861p3487861.html Sent from the R help mailing list archive at Nabble.com. From EN3X at hscmail.mcc.virginia.edu Sun May 1 17:48:00 2011 From: EN3X at hscmail.mcc.virginia.edu (Nemergut, Edward *HS) Date: Sun, 1 May 2011 11:48:00 -0400 Subject: [R] Mean/SD of Each Position in Table Message-ID: I have 100+ .csv files which have the basic format: > test X Substance1 Substance2 Substance3 Substance4 Substance5 1 Time1 10 0 0 0 0 2 Time2 9 5 0 0 0 3 Time3 8 10 1 0 0 4 Time4 7 20 2 1 0 5 Time5 6 25 3 2 1 6 Time6 5 30 4 2 2 7 Time7 4 25 5 3 3 8 Time8 3 20 6 3 4 9 Time9 2 15 5 3 5 10 Time10 1 10 4 4 6 Each table is of exactly the same dimensions. After reading each of the 100+ .csv files into R, I want determine the mean and SD of each and every cell. That is to ask, I to calculate the mean and SD for (Time1,Substance1) and every other cell from each of the 100+ .csv files. I imagine this is a fairly basic question, but my search has been unsuccessful. Thanks in advance, ECN From mweiss at temple.edu Sun May 1 18:54:19 2011 From: mweiss at temple.edu (mary weiss) Date: Sun, 1 May 2011 09:54:19 -0700 (PDT) Subject: [R] Tests for the need of cluster analysis Message-ID: <1304268859064-3488097.post@n4.nabble.com> Does R have the capability to perform tests for the need of clustering analysis (e.g., in prabclus)? I am using panel data with two-way fixed effects but am unsure about whether I should be using a cluster option as well to estimate my model.-- View this message in context: http://r.789695.n4.nabble.com/Tests-for-the-need-of-cluster-analysis-tp3488097p3488097.html Sent from the R help mailing list archive at Nabble.com. From andrew.e.coop at gmail.com Sun May 1 18:52:52 2011 From: andrew.e.coop at gmail.com (Andrew Coop) Date: Sun, 1 May 2011 10:52:52 -0600 Subject: [R] Urgent: conditional formula for nls Message-ID: I have data vectors x and y both with 179 observations. I'm trying to fit a nonlinear model with five parameters using nls. The formula is only defined within a range of x-values, it should be zero otherwise, thus my attempted use of ifelse: > df<-data.frame(x,y) > nlsfit<-nls(y~ifelse(x>m&x References: <23533F75-89A0-43F8-9822-5D6BA30DECF0@comcast.net> <1304086631063-3483785.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rainer.stuetz at gmail.com Sun May 1 20:52:18 2011 From: rainer.stuetz at gmail.com (Rainer Stuetz) Date: Sun, 1 May 2011 20:52:18 +0200 Subject: [R] microsecond timestamp support In-Reply-To: <71BB248B-ECF8-4E46-9FCD-050401E860B4@gmail.com> References: <71BB248B-ECF8-4E46-9FCD-050401E860B4@gmail.com> Message-ID: On Sun, May 1, 2011 at 15:33, Joel Reymont wrote: > > Does R have support for microseconds in timestamps, e.g. when reading this in > > "Time","Include","Kind","Duration" > 2011-04-01 14:20:36.368324,Y,U,1.03238296509 > 2011-04-01 14:20:35.342732,Y,C,0.0252721309662 > 2011-04-01 14:20:34.337209,Y,R,0.00522899627686 > See ?strptime: Specific to R is %OSn, which for output gives the seconds to 0 <= n <= 6 decimal places (and if %OS is not followed by a digit, it uses the setting of getOption("digits.secs"), or if that is unset, n = 3). Further, for strptime %OS will input seconds including fractional seconds. Note that %S ignores (and not rounds) fractional parts on output. dat <- read.table(textConnection( '"Time","Include","Kind","Duration" 2011-04-01 14:20:36.368324,Y,U,1.03238296509 2011-04-01 14:20:35.342732,Y,C,0.0252721309662 2011-04-01 14:20:34.337209,Y,R,0.00522899627686'), header=TRUE, sep=",") R> dat$Time <- as.POSIXct(dat$Time, "%Y-%m-%d %H:%M:%OS6") R> dat$Time [1] "2011-04-01 14:20:36.368" "2011-04-01 14:20:35.343" [3] "2011-04-01 14:20:34.337" R> options(digits.secs=6) R> dat$Time [1] "2011-04-01 14:20:36.368324" "2011-04-01 14:20:35.342732" [3] "2011-04-01 14:20:34.337209" R> class(dat$Time) [1] "POSIXct" "POSIXt" HTH, Rainer From djmuser at gmail.com Sun May 1 20:53:12 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sun, 1 May 2011 11:53:12 -0700 Subject: [R] Mean/SD of Each Position in Table In-Reply-To: References: Message-ID: Hi: I would do something like the following: (1) Create a vector of the file names. (2) Use lapply() to read the files into a list. (3) Use the reshape or reshape2 package to melt the individual files into 'long' form. (4) rbind together the resulting data frames. (5) Use a summarization function to generate the means and standard deviations. I created three data frames that have the structure you provided below and wrote them out to csv files. The following code creates a vector of file names, then uses lapply() to read the data files consecutively and assign them to components of a list,. Next, I create a small utility function that uses the reshape2 package to melt the data into 'long form'. The ldply function from package plyr is then called to apply the function to each file and then to bind them all together into a single data frame. Finally, the ddply() function in plyr is used to get the mean and standard deviation for each time/substance combination. #### Code to create test files for the example # File creation for test files: ds_create <- function() { times <- paste('Time', 1:10, sep = '') cnames <- paste('Substance', 1:5, sep = '') m <- matrix(rpois(50, 7), nrow = 10) colnames(m) <- cnames m <- as.data.frame(m) m$Time <- times write.csv(m, file = paste(name, '.csv', sep = ''), quote = FALSE, row.names = FALSE) } nms <- paste('m', 1:3, sep = '') sapply(nms, ds_create) #### # Vector of file names files <- paste('m', 1:3, '.csv', sep = '') # Read the data frames into a list, where each data frame is a separate component filelst <- lapply(files, read.csv, header = TRUE) library(plyr) library(reshape2) # Function to melt a generic data frame f <- function(df) { melt.data.frame(df, id = 'Time', variable_name = 'Substance', value_name = 'y') } # Apply the function to each component of the list and rbind the results together bigdf <- ldply(filelst, f) # Obtain the mean and sd for each Time/Substance combination bigsumm <- ddply(bigdf, .(Time, Substance), summarise, mean = mean(y), sd = sd(y)) # ---- Caveat: If you have the reshape package loaded, then at present the value_name = assignment will not go through and the name of the last variable will be 'value'. In that event, you can either rename 'value' to 'y' with names(bigdf)[3] <- 'y' or change 'y' to 'value' before you invoke ddply() on bigdf(). Check bigdf() with head(bigdf) to verify that the names expected are 'Time', 'Substance' and 'y' before running the last command. # ---- The result I get is > dim(bigsumm) [1] 50 4 > head(bigsumm) Time Substance mean sd 1 Time1 Substance1 10.333333 2.516611 2 Time1 Substance2 10.666667 1.154701 3 Time1 Substance3 6.000000 2.645751 4 Time1 Substance4 6.333333 1.154701 5 Time1 Substance5 5.333333 1.527525 6 Time10 Substance1 4.666667 3.055050 The structure is what matters. You should be able to extend this template to your 100 data frames. HTH, Dennis On Sun, May 1, 2011 at 8:48 AM, Nemergut, Edward *HS wrote: > I have 100+ .csv files which have the basic format: > >> test > ? ? ? ?X Substance1 Substance2 Substance3 Substance4 Substance5 > 1 ? Time1 ? ? ? ? 10 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0 > 2 ? Time2 ? ? ? ? ?9 ? ? ? ? ?5 ? ? ? ? ?0 ? ? ? ? ?0 ? ? ? ? ?0 > 3 ? Time3 ? ? ? ? ?8 ? ? ? ? 10 ? ? ? ? ?1 ? ? ? ? ?0 ? ? ? ? ?0 > 4 ? Time4 ? ? ? ? ?7 ? ? ? ? 20 ? ? ? ? ?2 ? ? ? ? ?1 ? ? ? ? ?0 > 5 ? Time5 ? ? ? ? ?6 ? ? ? ? 25 ? ? ? ? ?3 ? ? ? ? ?2 ? ? ? ? ?1 > 6 ? Time6 ? ? ? ? ?5 ? ? ? ? 30 ? ? ? ? ?4 ? ? ? ? ?2 ? ? ? ? ?2 > 7 ? Time7 ? ? ? ? ?4 ? ? ? ? 25 ? ? ? ? ?5 ? ? ? ? ?3 ? ? ? ? ?3 > 8 ? Time8 ? ? ? ? ?3 ? ? ? ? 20 ? ? ? ? ?6 ? ? ? ? ?3 ? ? ? ? ?4 > 9 ? Time9 ? ? ? ? ?2 ? ? ? ? 15 ? ? ? ? ?5 ? ? ? ? ?3 ? ? ? ? ?5 > 10 Time10 ? ? ? ? ?1 ? ? ? ? 10 ? ? ? ? ?4 ? ? ? ? ?4 ? ? ? ? ?6 > > Each table is of exactly the same dimensions. ?After reading each of the > 100+ .csv files into R, I want determine the mean and SD of each and every > cell. ?That is to ask, I to calculate the mean and SD for (Time1,Substance1) > and every other cell from each of the 100+ .csv files. > > I imagine this is a fairly basic question, but my search has been > unsuccessful. > > Thanks in advance, > ECN > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Sun May 1 20:57:46 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sun, 1 May 2011 11:57:46 -0700 Subject: [R] Urgent: conditional formula for nls In-Reply-To: References: Message-ID: Hi: Instead of using ifelse(), you can multiply the logical statement by the rest of the expression. When the logical statement is false, its numerical value is zero. HTH, Dennis On Sun, May 1, 2011 at 9:52 AM, Andrew Coop wrote: > I have data vectors x and y both with 179 observations. ?I'm trying to > fit a nonlinear model with five parameters using nls. ?The formula is > only defined within a range of x-values, it should be zero otherwise, > thus my attempted use of ifelse: > >> df<-data.frame(x,y) >> nlsfit<-nls(y~ifelse(x>m&x Error in nlsModel(formula, mf, start, wts) : > ?singular gradient matrix at initial parameter estimates > In addition: Warning messages: > 1: In log((x - m)/(-x + m + abs(s))) : NaNs produced > 2: In log((x - m)/(-x + m + abs(s))) : NaNs produced > 3: In log((x - m)/(-x + m + abs(s))) : NaNs produced > 4: In log((x - m)/(-x + m + abs(s))) : NaNs produced > 5: In log((x - m)/(-x + m + abs(s))) : NaNs produced > 6: In log((x - m)/(-x + m + abs(s))) : NaNs produced > 7: In log((x - m)/(-x + m + abs(s))) : NaNs produced > > What am I doing wrong? > > Thanks, > Andy > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From f.harrell at vanderbilt.edu Sun May 1 21:36:00 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Sun, 1 May 2011 12:36:00 -0700 (PDT) Subject: [R] recommendation on B for validate.lrm () ? In-Reply-To: <1304182052594-3486200.post@n4.nabble.com> References: <1304182052594-3486200.post@n4.nabble.com> Message-ID: <1304278560175-3488384.post@n4.nabble.com> For this case B=200 should work well if using the bootstrap. For cross-val. you can use B=10-fold cross-val and repeat the process 100 times for adequate precision, averaging over the 100 as done in http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/logistic.val.pdf (note this was using the Design package and there may be subtle changes with the rms package). Frank viostorm wrote: > > I have a logistic regression model I'm trying to do k-fold cross > validation on. > > The number of observations is approximately 550 and an event rate of about > 30% > > Does anyone have a recommendation for a B value to use for this data set? > -----Frank Harrell Department of Biostatistics, Vanderbilt University-- View this message in context: http://r.789695.n4.nabble.com/recommendation-on-B-for-validate-lrm-tp3486200p3488384.html Sent from the R help mailing list archive at Nabble.com. From m.marcinmichal at gmail.com Sun May 1 21:23:47 2011 From: m.marcinmichal at gmail.com (m.marcinmichal) Date: Sun, 1 May 2011 12:23:47 -0700 (PDT) Subject: [R] Kolmogorov-Smirnov test In-Reply-To: References: <1303939363957-3479506.post@n4.nabble.com> <1304027619288-3482349.post@n4.nabble.com> Message-ID: <1304277827393-3488364.post@n4.nabble.com> Hi, many thanks for helpful answer. Best Marcin M.-- View this message in context: http://r.789695.n4.nabble.com/Kolmogorov-Smirnov-test-tp3479506p3488364.html Sent from the R help mailing list archive at Nabble.com. From sharma.ram.h at gmail.com Sun May 1 22:26:16 2011 From: sharma.ram.h at gmail.com (Ram H. Sharma) Date: Sun, 1 May 2011 16:26:16 -0400 Subject: [R] quick help needed: split a number and "find and replace" type of function that works like in MS excel Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From wildscop at hotmail.com Sun May 1 21:44:40 2011 From: wildscop at hotmail.com (Ehsan Karim) Date: Sun, 1 May 2011 12:44:40 -0700 Subject: [R] Longitudinal data with non-randomized subjects In-Reply-To: References: Message-ID: Apology for reposting, but the format of earlier message got distorted; hopefully this time it will be readable: From: wildscop at hotmail.com To: r-help at r-project.org Subject: Longitudinal data with non-randomized subjects Date: Sun, 1 May 2011 00:34:08 -0700 Dear List, I have a theoretical question related to epidemiological data analysis: If the treatment status (tx = 0,1) changes over time for the patients in a non-randomized cohort, is there a way to estimate the treatment effect? (i.e., after joining the study, some patients may have to wait for a period of time before receiving the treatment, i.e., the situation of patient with id == 2 for the following data) Data format is like the stanford heart transplant data (Therneau et al 2000, p69), but the patients were not randomized in selection and the covariate balance is not achieved: id time censor tx x1 x2 1 (0,10] 1 0 x11 x12 2 (0,8 ] 0 0 x21 x22 2 (9,19] 1 1 x21 x22 3 (0,13] 0 1 x31 x32 Is counting process form of a Cox model (coxph with start, stop, censoring status ~ tx + x1 + x2 covariates) sufficient? Is it possible to implement the propensity score methodology (Rosenbaum et al, 1983) in such situations? Any ideas/suggestions would be higly appreciated. Thanks, Ehsan From HDoran at air.org Sun May 1 22:52:40 2011 From: HDoran at air.org (Doran, Harold) Date: Sun, 1 May 2011 16:52:40 -0400 Subject: [R] bwplot in ascending order Message-ID: Can anyone point me to examples with R code where bwplot in lattice is used to order the boxes in ascending order? I have found the following discussion and it partly works. But, I have a conditioning variable, so my example is more like bwplot(var1 ~ var2|condition, dat) Th example in the discussion below works only when there is not a conditioning variable as far as I can tell. I can tweak the example below to work, but then I get some ugly labels in the lattice plot. It seems index.cond is supposed to help me solve this, but I cannot find good examples showing its use. Thanks Harold http://r.789695.n4.nabble.com/bwplot-reorder-factor-on-y-axis-td790903.html From dwinsemius at comcast.net Sun May 1 23:02:23 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 1 May 2011 14:02:23 -0700 Subject: [R] indexing into a data.frame using another data.frame that also contains values for replacement In-Reply-To: References: Message-ID: <2D2B0999-3C86-48FD-907D-240EA668CDC0@comcast.net> On Apr 30, 2011, at 4:18 PM, Alice Wines wrote: > Hello all, > > I have a quandry I have been scratching my head about for a > while. I've searched the manual and the web and have not been able to > find an acceptable result, so I am hoping for some help. > > I have two data frames and I want to index into the first using > the second, and replace the specific values I have indexed with more > values from the second data.frame. I can do this using a loop, but I > wanted a quicker solution with no loops involved. > > Although my data set is much larger than this, a small example of what > I am trying to do is as follows: > > df1 <- data.frame(rows=c("A","B","C", "B", "C", "A"), > columns=c("21_2", "22_2", "23_2", "21_2", "22_2", "23_2"), > values=c(3.3, 2.5, 67.2, 44.3, 53, 66)) > df2 <- data.frame(matrix(rep(NA, length(df1$values)),nrow=3, ncol=3)) > names(df2) <- c("21_2", "22_2", "23_2") > row.names(df2) <- c("A", "B", "C") > >> df1 > rows columns values > 1 A 21_2 3.3 > 2 B 22_2 2.5 > 3 C 23_2 67.2 > 4 B 21_2 44.3 > 5 C 22_2 53.0 > 6 A 23_2 66.0 > > require(Matrix) > xtabs(values~rows+columns, data=df1, sparse=TRUE) 3 x 3 sparse Matrix of class "dgCMatrix" 21_2 22_2 23_2 A 3.3 . 66.0 B 44.3 2.5 . C . 53.0 67.2 > > >> df2 > 21_2 22_2 23_2 > A NA NA NA > B NA NA NA > C NA NA NA > > > Note that none of the same locations in df2 are specified twice > in df2, so I'm not worried about over-writing it. > > I have tried 'mapply' and 'replace', but apparently either they do > not work well for this or I don't understand how to use them properly > for this purpose. My understanding is that 'replace' needs a vector > input and that one cannot create a vector of vectors, so I couldn't > pass my indices to 'replace'. > > When I tried mapply, the code I used was something like what > follows: > > df3 <- mapply('[<-' , df2, paste(as.character(df1$rows), > as.character(df1$columns), sep=', '), df1$values) > > but it yields the following strange result > >> df3 > 21_2 22_2 23_2 > NA NA NA NA NA NA > NA NA NA NA NA NA > NA NA NA NA NA NA > A, 21_2 3.3 2.5 67.2 44.3 53 66 > > > What I want to see is the following: > >> df3 > 21_2 22_2 23_2 > A 3.3 NA 66.0 > B 44.3 2.5 NA > C NA 53.0 67.2 -- David Winsemius, MD Heritage Laboratories West Hartford, CT From mailinglist.honeypot at gmail.com Sun May 1 23:03:59 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sun, 1 May 2011 17:03:59 -0400 Subject: [R] quick help needed: split a number and "find and replace" type of function that works like in MS excel In-Reply-To: References: Message-ID: Hi, There are a couple of ways to do what you want. I'll provide the fodder and let you finish the implementation. On Sun, May 1, 2011 at 4:26 PM, Ram H. Sharma wrote: > Hi R experts > > I have a couple of ?quick question: > > Q1 > #my data > set.seed(12341) > SN <- 1:100 > pool<- c(12,13,14, 23, 24, 34) > CT1<- sample(pool, 100, replace= TRUE) > ?set.seed(1242) > CT2 <- sample(pool, 100, replace= TRUE) > ?set.seed(142) > CT3 <- sample(pool, 100, replace= TRUE) > # the number of variables run to end of coulmn 20000 > mydf <- data.frame(SN, CT1, CT2, CT3) > > First question: how can I split 12 into 1 ?2, 13 into 1 ?3, ?14 into 1 ?4? > What I am trying here is to split each number into two and make seperate > variable CT1a and CT1b, CT2a and CT2b, CT3a and CT3b. > > ?Tried with strsplit () but I believe this works with characters only You can convert your numbers to characters, if you like. Using your dataset, consider: R> ct1.char <- as.character(mydf$CT1) R> ct1.char <- strsplit(as.character(mydf$CT1), '') R> ct1a <- sapply(ct1.char, '[', 1) ## "non-obvious" use of '[' as R> ct1b <- sapply(ct1.char, '[', 2) ## a function is intentional :-) R> head(data.frame(ct1a=ct1a, ct1b=ct1b)) ct1a ct1b 1 3 4 2 1 4 3 2 3 4 1 4 5 3 4 6 2 3 > Q2 > Is there any function that works in the same manner as find and replace > function MS excel. Just for example, if I want to replace all 1s in the > above data frame with "A", 2 with "B". Thus the number 12 will be converted > to "AB". ?I tried with car but it very slow as I need to very large > dataframe. Try gsub: R> head(ct1a) [1] "3" "1" "2" "1" "3" "2" R> head(gsub("1", "A", ct1a)) [1] "3" "A" "2" "A" "3" "2" or you can use a "translation table" R> xlate <- c('1'='A', '2'='B', '3'='C') R> head(xlate[ct1a]) 3 1 2 1 3 2 "C" "A" "B" "A" "C" "B" You might also consider not converting your original data into characters and splitting off the integers -- you can use modulo arithmetic to get each digit, ie: R> head(mydf$CT1) [1] 34 14 23 14 34 23 ## First digit R> head(as.integer(mydf$CT1 / 10)) [1] 3 1 2 1 3 2 ## Second digit R> head(mydf$CT1 %% 10) [1] 4 4 3 4 4 3 There's some food for thought .. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From philb at philbrierley.com Sun May 1 23:57:52 2011 From: philb at philbrierley.com (pdb) Date: Sun, 1 May 2011 14:57:52 -0700 (PDT) Subject: [R] caret - prevent resampling when no parameters to find Message-ID: <1304287072729-3488761.post@n4.nabble.com> I want to use caret to build a model with an algorithm that actually has no parameters to find. How do I stop it from repeatedly building the same model 25 times? library(caret) data(mdrr) LOGISTIC_model <- train(mdrrDescr,mdrrClass ,method='glm' ,family=binomial(link="logit") ) LOGISTIC_model 528 samples 342 predictors 2 classes: 'Active', 'Inactive' Pre-processing: None Resampling: Bootstrap (25 reps) Summary of sample sizes: 528, 528, 528, 528, 528, 528, ... Resampling results Accuracy Kappa Accuracy SD Kappa SD 0.552 0.0999 0.0388 0.0776 -- View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488761.html Sent from the R help mailing list archive at Nabble.com. From jholtman at gmail.com Mon May 2 00:01:56 2011 From: jholtman at gmail.com (jim holtman) Date: Sun, 1 May 2011 18:01:56 -0400 Subject: [R] microsecond timestamp support In-Reply-To: References: <71BB248B-ECF8-4E46-9FCD-050401E860B4@gmail.com> Message-ID: One thing to watch out for using POSIXct is 1us is about the limit of accuracy due to floating point (see FAQ 7.31). Notice that printing out today's date requires about 15 digits with microsecond granularity. Notice in the example that if the time difference between intervals is 0.1 us, you have exceeded the limit of precision with POSIXct. If you need subsecond granularity, then maybe you want to break your timing up into a variable with 'days' and then another with time within a day, or create a different base for POSIXct: > x <- as.POSIXct("2011-05-01 17:55:23.123456") > x [1] "2011-05-01 17:55:23 EDT" > xc <- as.numeric(x) > print(xc, digits=20) [1] 1304286923.123456 > xc <- xc + seq(0, by = 0.000001, length = 20) > diff(xc) # about 1us granularity [1] 0.0000009536743 0.0000009536743 0.0000011920929 0.0000009536743 0.0000009536743 0.0000009536743 [7] 0.0000009536743 0.0000011920929 0.0000009536743 0.0000009536743 0.0000009536743 0.0000009536743 [13] 0.0000011920929 0.0000009536743 0.0000009536743 0.0000009536743 0.0000009536743 0.0000009536743 [19] 0.0000011920929 > xc <- as.numeric(x) > xc <- xc + seq(0, by = 0.0000001, length = 20) # by 0.1 us > diff(xc) # notice loss of precision [1] 0.0000000000000 0.0000002384186 0.0000000000000 0.0000002384186 0.0000000000000 0.0000002384186 [7] 0.0000000000000 0.0000000000000 0.0000002384186 0.0000000000000 0.0000002384186 0.0000000000000 [13] 0.0000000000000 0.0000002384186 0.0000000000000 0.0000002384186 0.0000000000000 0.0000002384186 [19] 0.0000000000000 > On Sun, May 1, 2011 at 2:52 PM, Rainer Stuetz wrote: > On Sun, May 1, 2011 at 15:33, Joel Reymont wrote: >> >> Does R have support for microseconds in timestamps, e.g. when reading this in >> >> "Time","Include","Kind","Duration" >> 2011-04-01 14:20:36.368324,Y,U,1.03238296509 >> 2011-04-01 14:20:35.342732,Y,C,0.0252721309662 >> 2011-04-01 14:20:34.337209,Y,R,0.00522899627686 >> > > See ?strptime: > > ?Specific to R is %OSn, which for output gives the seconds to 0 <= n <= 6 > ?decimal places (and if %OS is not followed by a digit, it uses the setting of > ?getOption("digits.secs"), or if that is unset, n = 3). Further, for strptime > ?%OS will input seconds including fractional seconds. Note that %S ignores > ?(and not rounds) fractional parts on output. > > > dat <- read.table(textConnection( > '"Time","Include","Kind","Duration" > 2011-04-01 14:20:36.368324,Y,U,1.03238296509 > 2011-04-01 14:20:35.342732,Y,C,0.0252721309662 > 2011-04-01 14:20:34.337209,Y,R,0.00522899627686'), > header=TRUE, sep=",") > > R> dat$Time <- as.POSIXct(dat$Time, "%Y-%m-%d %H:%M:%OS6") > R> dat$Time > [1] "2011-04-01 14:20:36.368" "2011-04-01 14:20:35.343" > [3] "2011-04-01 14:20:34.337" > R> options(digits.secs=6) > R> dat$Time > [1] "2011-04-01 14:20:36.368324" "2011-04-01 14:20:35.342732" > [3] "2011-04-01 14:20:34.337209" > R> class(dat$Time) > [1] "POSIXct" "POSIXt" > > HTH, > Rainer > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From mxkuhn at gmail.com Mon May 2 00:33:05 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Sun, 1 May 2011 18:33:05 -0400 Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: <1304287072729-3488761.post@n4.nabble.com> References: <1304287072729-3488761.post@n4.nabble.com> Message-ID: It isn't building the same model since each fit is created from different data sets. The resampling is sort of the point of the function, but if you really want to avoid it, supply your own index in trainControl that has every index (eg, index = seq(along = mdrrClass)). In this case, the performance it gives is the apparent error rate. Max On Sun, May 1, 2011 at 5:57 PM, pdb wrote: > I want to use caret to build a model with an algorithm that actually has no > parameters to find. > > How do I stop it from repeatedly building the same model 25 times? > > > library(caret) > data(mdrr) > LOGISTIC_model <- train(mdrrDescr,mdrrClass > ? ? ? ? ? ? ? ? ? ? ? ?,method='glm' > ? ? ? ? ? ? ? ? ? ? ? ?,family=binomial(link="logit") > ? ? ? ? ? ? ? ? ? ? ? ?) > LOGISTIC_model > > 528 samples > 342 predictors > ?2 classes: 'Active', 'Inactive' > > Pre-processing: None > Resampling: Bootstrap (25 reps) > > Summary of sample sizes: 528, 528, 528, 528, 528, 528, ... > > Resampling results > > ?Accuracy ?Kappa ? Accuracy SD ?Kappa SD > ?0.552 ? ? 0.0999 ?0.0388 ? ? ? 0.0776 ?-- > View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488761.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From philb at philbrierley.com Mon May 2 00:41:06 2011 From: philb at philbrierley.com (pdb) Date: Sun, 1 May 2011 15:41:06 -0700 (PDT) Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: References: <1304287072729-3488761.post@n4.nabble.com> Message-ID: <1304289666790-3488881.post@n4.nabble.com> Hi Max, But in this example, it says the sample size is the same as the total number of samples, so unless the sampling is done by columns, wouldn't you get exactly the same model each time for logistic regression? ps - great package btw. I'm just beginning to explore its potential now.-- View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488881.html Sent from the R help mailing list archive at Nabble.com. From mxkuhn at gmail.com Mon May 2 00:45:59 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Sun, 1 May 2011 18:45:59 -0400 Subject: [R] Bigining with a Program of SVR In-Reply-To: <1304106463512-3484476.post@n4.nabble.com> References: <1304106463512-3484476.post@n4.nabble.com> Message-ID: When you say "variable" do you mean predictors or responses? In either case, they do. You can generally tell by reading the help files and looking at the examples. Max On Fri, Apr 29, 2011 at 3:47 PM, ypriverol wrote: > Hi: > ?I'm starting a research of Support Vector Regression. I want to obtain a > model to predict a property A with > ?a set of property B, C, D, ... ?This problem is very common for example in > QSAR models. I want to know > ?some examples and package that could help me in this way. I know about > caret and e1071. But I' don't > ?know if this package can work with continues variables.? > > Thanks in advance > > -- > View this message in context: http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3484476.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From mxkuhn at gmail.com Mon May 2 00:51:28 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Sun, 1 May 2011 18:51:28 -0400 Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: <1304289666790-3488881.post@n4.nabble.com> References: <1304287072729-3488761.post@n4.nabble.com> <1304289666790-3488881.post@n4.nabble.com> Message-ID: No, the sampling is done on rows. The definition of a bootstrap (re)sample is one which is the same size as the original data but taken with replacement. The "Accuracy SD" and "Kappa SD" columns give you a sense of how the model performance varied across these bootstrap data sets (i.e. they are not the same data set). In the end, the original training set is used to fit the final model that is used for prediction. Max On Sun, May 1, 2011 at 6:41 PM, pdb wrote: > Hi Max, > > But in this example, it says the sample size is the same as the total number > of samples, so unless the sampling is done by columns, wouldn't you get > exactly the same model each time for logistic regression? > > ps - great package btw. I'm just beginning to explore its potential now.-- > View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488881.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From philb at philbrierley.com Mon May 2 01:09:17 2011 From: philb at philbrierley.com (pdb) Date: Sun, 1 May 2011 16:09:17 -0700 (PDT) Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: References: <1304287072729-3488761.post@n4.nabble.com> <1304289666790-3488881.post@n4.nabble.com> Message-ID: <1304291357170-3488911.post@n4.nabble.com> Thanks for the clarification Max - I should have realised that. One final question, I like caret because it lets me pass in data to all functions in the same way. For glm I have only ever used the formula notation and did not see a way to pass in predictors and a target individually. How do I do this? How do I get the 2nd example below to work? Many thanks. LOGISTIC_model <- train(mdrrDescr,mdrrClass ,method='glm' ,family=binomial(link="logit") ) LOGISTIC_model1 <- glm(mdrrDescr,mdrrClass, family=binomial(link="logit")) -- View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488911.html Sent from the R help mailing list archive at Nabble.com. From A.Robinson at ms.unimelb.edu.au Mon May 2 01:14:40 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Mon, 2 May 2011 09:14:40 +1000 Subject: [R] using tapply with multiple variables In-Reply-To: References: Message-ID: <20110501231440.GF48756@ms.unimelb.edu.au> This is a nice demonstration of the formula interface to aggregate. A less elegant alternative is to pass lists as arguments. with(dd, aggregate(Correct, by = list(Subject = Subject, Group = Group), FUN = function(x) sum(x == 'C'))) Using a list is advantageous if you want to make the summary of more than one variable (which does not seem to be the case, here) --- I believe that the formula interface doesn't allow for that. That would be set up like this with(dd, aggregate(x = list(Correct = Correct, other target variables listed here, ...), by = list(Subject = Subject, Group = Group), FUN = function(x) sum(x == 'C'))) Cheers Andrew On Sat, Apr 30, 2011 at 10:03:24PM -0700, Dennis Murphy wrote: > Hi: > > If you have R 2.11.x or later, one can use the formula version of aggregate(): > > aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = function(x) > sum(x == 'C')) > > A variety of contributed packages (plyr, data.table, doBy, sqldf and > remix, among others) have similar capabilities. > > If you want some additional summaries (e.g., percent correct), here is > an example function for a single subject/group that aggregate() can > use to propagate to all subgroups and subjects (I encourage you to > play with it): > > f <- function(x) { > Correct <- sum(x == 'C') > Percent <- round(100 * Correct/length(x), 3) > c(Number = Correct, Percent = Percent) > } > aggregate(Correct ~ Subject + Group, data = ALLDATA, FUN = f) > > The particular function isn't as important as knowing you can do this > sort of thing. Several of the contributed packages indicated above > have similar, if not superior, capabilities, depending on the > situation. > > Toy example to test the above: > > dd <- data.frame(Subject = rep(1:5, each = 100), > Group = rep(rep(c('C', 'T'), each = 50), 5), > Correct = factor(rbinom(500, 1, 0.8), labels = c('I', 'C'))) > aggregate(Correct ~ Subject + Group, data = dd, FUN = function(x) sum(x == 'C')) > Subject Group Correct > 1 1 C 40 > 2 2 C 36 > 3 3 C 39 > 4 4 C 37 > 5 5 C 41 > 6 1 T 43 > 7 2 T 45 > 8 3 T 37 > 9 4 T 45 > 10 5 T 36 > aggregate(Correct ~ Subject + Group, data = dd, FUN = f) > Subject Group Correct.Number Correct.Percent > 1 1 C 40 80 > 2 2 C 36 72 > 3 3 C 39 78 > 4 4 C 37 74 > 5 5 C 41 82 > 6 1 T 43 86 > 7 2 T 45 90 > 8 3 T 37 74 > 9 4 T 45 90 > 10 5 T 36 72 > > HTH, > Dennis > > On Sat, Apr 30, 2011 at 12:28 PM, Kevin Burnham wrote: > > HI All, > > > > I have a long data file generated from a minimal pair test that I gave to > > learners of Arabic before and after a phonetic training regime. ?For each of > > thirty some subjects there are 800 rows of data, from each of 400 items at > > pre and posttest. ?For each item the subject got correct, there is a 'C' in > > the column 'Correct'. ?The line: > > > > tapply(ALLDATA$Correct, ALLDATA$Subject, function(x)sum(x=="C")) > > > > gives me the sum of correct answers for each subject. > > > > However, I would like to have that sum separated by Time (pre or post). ?Is > > there a simple way to do that? > > > > > > What if I further wish to separate by Group (T or C)? > > > > Thanks, > > Kevin > > > > ? ? ? ?[[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From philb at philbrierley.com Mon May 2 01:18:35 2011 From: philb at philbrierley.com (pdb) Date: Sun, 1 May 2011 16:18:35 -0700 (PDT) Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: <1304291357170-3488911.post@n4.nabble.com> References: <1304287072729-3488761.post@n4.nabble.com> <1304289666790-3488881.post@n4.nabble.com> <1304291357170-3488911.post@n4.nabble.com> Message-ID: <1304291915391-3488923.post@n4.nabble.com> glm.fit - answered my own question by reading the manual!-- View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488923.html Sent from the R help mailing list archive at Nabble.com. From A.Robinson at ms.unimelb.edu.au Mon May 2 01:22:14 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Mon, 2 May 2011 09:22:14 +1000 Subject: [R] Different results of coefficients by packages penalized and glmnet In-Reply-To: References: Message-ID: <20110501232214.GG48756@ms.unimelb.edu.au> Hi Yao, I can't answer that question, but I offer the following thoughts for your consideration. Generally it's best to approach the package maintainers directly with questions like these. You can find their contact details in the package documentation. Also, you will want to make sure that you provide commented, minimal, self-contained, reproducible code. I can't run the code below because I don't have the data. Try to create an example that shows your problem using data that will be readily available to the maintainers. Perhaps one of the packages provides a n example dataset --- that would be best. If not, you should write code to generate an example dataset, or be prepared to share your own data. I hope that this helps, Andrew On Sun, May 01, 2011 at 05:01:54PM +0800, zhu yao wrote: > Dear R users: > > Recently, I learn to use penalized logistic regression. Two packages > (penalized and glmnet) have the function of lasso. > So I write these code. However, I got different results of coef. Can someone > kindly explain. > > # lasso using penalized > library(penalized) > pena.fit2<-penalized(HRLNM,penalized=~CN+NoSus,lambda1=1,model="logistic",standardize=TRUE) > pena.fit2 > coef(pena.fit2) > opt<-optL1(HRLNM,penalized=~CN+NoSus,fold=5) > opt$lambda > coef(opt$fullfit) > prof<-profL1(HRLNM,penalized=~CN+NoSus,fold=opt$fold,steps=20) > plot(prof$lambda, prof$cvl, type="l") > plotpath(prof$fullfit) > pena.fit2<-penalized(HRLNM,penalized=~CN+NoSus,lambda1=opt$lambda,model="logistic",standardize=TRUE,steps=20) > plotpath(pena.fit2) > pena.fit2<-penalized(HRLNM,penalized=~CN+NoSus,lambda1=opt$lambda,model="logistic",standardize=TRUE) > coef(pena.fit2) > > > #lasso using gamnet > library(glmnet) > factors<-matrix(c(CN,NoSus),ncol=2) > colnames(factors)<-c("CN","NoSus") > glmn.fit2<-glmnet(x=factors,y=HRLNM,family="binomial") > cvglmnet<-cv.glmnet(x=factors,y=HRLNM,family="binomial",nfolds=5) > plot(cvglmnet) > cvglmnet$lambda.min > which(cvglmnet$lambda==cvglmnet$lambda.min) > glmn.fit2<-glmnet(x=factors,y=HRLNM,family="binomial",lambda=cvglmnet$lambda.min) > coef(glmn.fit2) > > > > Thanks a lot > > btw: how to calculate the C.I. of coefs? > > > *Yao Zhu* > *Department of Urology > Fudan University Shanghai Cancer Center > Shanghai, China* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From erich.neuwirth at univie.ac.at Mon May 2 01:32:19 2011 From: erich.neuwirth at univie.ac.at (Neuwirth Erich) Date: Mon, 2 May 2011 01:32:19 +0200 Subject: [R] multiple mosaic plots layout Message-ID: I would like to display multiple mosaic plots from vcd (not defined by a model but derived from different data sets) side by side. Neither par(mfrow=...) nor layout seem to allow to arrange multiple mosaic plots in a grid. Is there an easy way of arranging mosaics in a grid? From mxkuhn at gmail.com Mon May 2 01:34:57 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Sun, 1 May 2011 19:34:57 -0400 Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: <1304291915391-3488923.post@n4.nabble.com> References: <1304287072729-3488761.post@n4.nabble.com> <1304289666790-3488881.post@n4.nabble.com> <1304291357170-3488911.post@n4.nabble.com> <1304291915391-3488923.post@n4.nabble.com> Message-ID: Not all modeling functions have both the formula and "matrix" interface. For example, glm() and rpart() only have formula method, enet() has only the matrix interface and ksvm() and others have both. This was one reason I created the package (so we don't have to remember all this). train() lets you specify the model either way. When the actual model is fit, it favors the matrix interface whenever possible (since it is more efficient) and works out the details behind the scenes. For your example, you can fit the model you want using train(): train(mdrrDescr,mdrrClass,method='glm') If y is a factor, it automatically adds the 'family = binomial' option when the model is fit (so you don't have to). Max On Sun, May 1, 2011 at 7:18 PM, pdb wrote: > glm.fit - answered my own question by reading the manual!-- > View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3488923.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From jholtman at gmail.com Mon May 2 01:39:20 2011 From: jholtman at gmail.com (jim holtman) Date: Sun, 1 May 2011 19:39:20 -0400 Subject: [R] importing and filtering time series data In-Reply-To: <13AE38B7-6EE4-41C6-85B5-4256C2875457@gmail.com> References: <13AE38B7-6EE4-41C6-85B5-4256C2875457@gmail.com> Message-ID: Here is one approach. It would be good to provide a reasonable sample of data: > x <- unclass(Sys.time()) # today's date > # create some data > # increments by ~ 0.1 seconds > len <- cumsum(runif(100, 0, 0.1)) > dataFile <- data.frame(time = x + len, + flag = sample(c("Y", "N"), 100, TRUE), + dur = runif(100, 10,1000) + ) > write.csv(dataFile, file = 'myData.csv', row.names = FALSE) > > # read the data and summarize by 1 second intervals > input <- read.csv('myData.csv') > # remove "N" > input <- subset(input, flag == "N") > require(data.table) # I like this for creating summaries > input <- data.table(input) > # add column for summary > input$key <- factor(trunc(input$time)) > input[, + list(count = length(time) + , latency = mean(dur) + , var = var(dur) + , '5%' = quantile(dur, prob = 0.05) + , '95%' = quantile(dur, prob = 0.95) + ) + , by = key + ] key count latency var X5. X95. [1,] 1304293090 6 558.3471 73765.28 255.09390 872.3692 [2,] 1304293091 8 580.4440 103743.05 132.39461 963.2297 [3,] 1304293092 10 494.1759 62945.55 150.89719 869.8083 [4,] 1304293093 10 557.1942 105834.81 102.53878 941.1442 [5,] 1304293094 17 477.2077 106452.72 35.15032 947.0750 > > On Fri, Apr 29, 2011 at 11:27 AM, Joel Reymont wrote: > Folks, > > I'm new to R and would like to use it to analyze web server performance data. > > I collect the data in this CSV format: > > 1304083104.41,Y,668.856249809 > 1304083104.41,Y,348.143193007 > > First column is a timestamp, rows with N instead of Y need to be skipped and the last column has the same format as the first column, except it's request duration (latency). > > I would like to calculate average number of requests per second, mean latency, variance, 5 and 95 percentiles. > > What is the best way to accomplish this, starting with importing of time series? > > ? ? ? ?Thanks, Joel > > -------------------------------------------------------------------------- > - for hire: mac osx device driver ninja, kernel extensions and usb drivers > ---------------------+------------+--------------------------------------- > http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont > ---------------------+------------+--------------------------------------- > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From philb at philbrierley.com Mon May 2 02:05:10 2011 From: philb at philbrierley.com (pdb) Date: Sun, 1 May 2011 17:05:10 -0700 (PDT) Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: References: <1304287072729-3488761.post@n4.nabble.com> <1304289666790-3488881.post@n4.nabble.com> <1304291357170-3488911.post@n4.nabble.com> <1304291915391-3488923.post@n4.nabble.com> Message-ID: <1304294710672-3489020.post@n4.nabble.com> Thanks again Max - a great time saver this is. Now just for my sanity, if I use glm.fit to build a model where I have the matrices, how do I then use the predict function without getting an error message? > LOGISTIC_model1 <- glm.fit(mdrrDescr,mdrrClass, > family=binomial(link="logit")) Warning messages: 1: glm.fit: algorithm did not converge 2: glm.fit: fitted probabilities numerically 0 or 1 occurred > predict(LOGISTIC_model1) Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "c('double', 'numeric')" Secondly, caret acts as a nice wrapper to protect me from all this, and it does the resampling to give me an idea of the expected model fit. If I was doing a parameter search, would it do all this resampling for each combination of parameters? Now if I just want to build a model and not worry about all the resampling (in my case I just want a set of baseline predictions to compare various variable selections methods against) it would be nice if there was a simple option to turn off the resampling. -- View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3489020.html Sent from the R help mailing list archive at Nabble.com. From baptiste.auguie at googlemail.com Mon May 2 02:06:50 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Mon, 2 May 2011 12:06:50 +1200 Subject: [R] multiple mosaic plots layout In-Reply-To: References: Message-ID: Unfortunately, it seems that vcd doesn't return grobs but draws directly to the device, which prevents a concise solution. You could try the following, library(gridExtra) library(vcd) data("Titanic") p = grid.grabExpr(mosaic(Titanic)) grid.arrange(p, p, p, ncol=2) Or, more versatile but also more verbose, pushViewport(...) mosaic(...) upViewport() pushViewport(...) mosaic(...) upViewport() etc.. HTH, baptiste On 2 May 2011 11:32, Neuwirth Erich wrote: > I would like to display multiple mosaic plots from vcd (not defined by a model but derived from different data sets) > side by side. > Neither par(mfrow=...) > nor layout seem to allow to arrange multiple mosaic plots in a grid. > Is there an easy way of arranging mosaics in a grid? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From xiagao1982 at gmail.com Mon May 2 02:20:51 2011 From: xiagao1982 at gmail.com (xiagao1982) Date: Mon, 2 May 2011 08:20:51 +0800 Subject: [R] How to pass objects from local() to GlobalEnv Message-ID: <201105020820330263691@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From wwwhsd at gmail.com Mon May 2 02:30:41 2011 From: wwwhsd at gmail.com (Henrique Dallazuanna) Date: Sun, 1 May 2011 21:30:41 -0300 Subject: [R] How to pass objects from local() to GlobalEnv In-Reply-To: <201105020820330263691@gmail.com> References: <201105020820330263691@gmail.com> Message-ID: Try this: local(x <<- 1) On Sun, May 1, 2011 at 9:20 PM, xiagao1982 wrote: > Hi all, > > I create some objects in local(), and want to pass them to GlobalEnv. How can I do this? Thanks! > > > > > xiagao1982 > 2011-05-02 > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O From xiagao1982 at gmail.com Mon May 2 02:56:47 2011 From: xiagao1982 at gmail.com (=?utf-8?B?eGlhZ2FvMTk4Mg==?=) Date: Mon, 2 May 2011 08:56:47 +0800 Subject: [R] =?utf-8?q?How_to_pass_objects_from_local=28=29_to_GlobalEnv?= References: <201105020820330263691@gmail.com>, Message-ID: <201105020856297706939@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From andrewdigby at mac.com Mon May 2 01:41:26 2011 From: andrewdigby at mac.com (adigs) Date: Sun, 1 May 2011 16:41:26 -0700 (PDT) Subject: [R] Sorting dataframe by number of occurrences of factor In-Reply-To: References: <1304144278769-3485443.post@n4.nabble.com> <20110430090058.GA22211@praha1.ff.cuni.cz> <1304178250005-3486088.post@n4.nabble.com> Message-ID: <1304293286795-3488978.post@n4.nabble.com> That's great - thanks all for your help.-- View this message in context: http://r.789695.n4.nabble.com/Sorting-dataframe-by-number-of-occurrences-of-factor-tp3485443p3488978.html Sent from the R help mailing list archive at Nabble.com. From philb at philbrierley.com Mon May 2 03:28:05 2011 From: philb at philbrierley.com (pdb) Date: Sun, 1 May 2011 18:28:05 -0700 (PDT) Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: References: <1304287072729-3488761.post@n4.nabble.com> Message-ID: <1304299685026-3489091.post@n4.nabble.com> Hi Max, I tried your suggestion but cam up with errors: fitControl<-trainControl(number=1) LOGISTIC_model <- train(mdrrDescr,mdrrClass ,method='glm' ,trControl = fitControl ) Fitting: parameter=none Error in if (all.equal(sort(x$index[[1]]), seq(along = x$data$.outcome))) x$data else x$data[-x$index[[i]], : argument is not interpretable as logical fitControl<-trainControl(seq(along = mdrrClass)) LOGISTIC_model <- train(mdrrDescr,mdrrClass ,method='glm' ,trControl = fitControl ) Error in switch(tolower(trControl$method), oob = NULL, cv = createFolds(y, : EXPR must be a length 1 vector In addition: Warning message: In if (trControl$method == "oob" & !(method %in% c("rf", "treebag", : the condition has length > 1 and only the first element will be used-- View this message in context: http://r.789695.n4.nabble.com/caret-prevent-resampling-when-no-parameters-to-find-tp3488761p3489091.html Sent from the R help mailing list archive at Nabble.com. From sharma.ram.h at gmail.com Mon May 2 03:48:00 2011 From: sharma.ram.h at gmail.com (Ram H. Sharma) Date: Sun, 1 May 2011 21:48:00 -0400 Subject: [R] quick help needed: split a number and "find and replace" type of function that works like in MS excel In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Achim.Zeileis at uibk.ac.at Mon May 2 09:23:32 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Mon, 2 May 2011 09:23:32 +0200 (CEST) Subject: [R] multiple mosaic plots layout In-Reply-To: References: Message-ID: On Mon, 2 May 2011, baptiste auguie wrote: > Unfortunately, it seems that vcd doesn't return grobs but draws > directly to the device, which prevents a concise solution. Yes. The reason is that vcd was first written before grobs were available. When we need multiple plots in a single layout, we use Baptiste's second (more verbose) solution. A worked example is included in ?Ord_plot. > You could try the following, > > library(gridExtra) > library(vcd) > data("Titanic") > > p = grid.grabExpr(mosaic(Titanic)) > grid.arrange(p, p, p, ncol=2) > > Or, more versatile but also more verbose, > > pushViewport(...) > mosaic(...) > upViewport() > pushViewport(...) > mosaic(...) > upViewport() > etc.. > > HTH, > > baptiste > > On 2 May 2011 11:32, Neuwirth Erich wrote: >> I would like to display multiple mosaic plots from vcd (not defined by a model but derived from different data sets) >> side by side. >> Neither par(mfrow=...) >> nor layout seem to allow to arrange multiple mosaic plots in a grid. >> Is there an easy way of arranging mosaics in a grid? >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From ligges at statistik.tu-dortmund.de Mon May 2 10:16:04 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 10:16:04 +0200 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: Message-ID: <4DBE6844.1090009@statistik.tu-dortmund.de> To get the statsitics, you will have to run each wilcox.test manually. the pairwise... version just extracts the p-values and adjusts them. Uwe Ligges On 28.04.2011 15:18, JP wrote: > Hi there, > > I am trying to do multiple pairwise Wilcoxon signed rank tests in a > manner similar to: > > a<- c(runif(1000, min=1,max=50), rnorm(1000, 50), rnorm(1000, 49.9, > 0.5), rgeom(1000, 0.5)) > b<- c(rep("group_a", 1000), rep("group_b", 1000), rep("group_c", > 1000), rep("group_d", 1000)) > pairwise.wilcox.test(a, b, alternative="two.sided", > p.adj="bonferroni", exact=F, paired=T) > > This gives me the following output: > > group_a group_b group_c > group_b<2e-16 - - > group_c<2e-16 0.25 - > group_d<2e-16<2e-16<2e-16 > > (which is kind of expected since group_b and group_c have similar distributions) > > I have found that when doing a wilcoxon signed ranked test you should report: > > - The median value (and not the mean or sd, presumably because of the > underlying potential non normal distribution) > - The Z score (or value) > - r > - p value > > My questions are: > > - Are the above enough/correct values to report (some places even > quote W and df) ? What else would you suggest? > - How do I calculate the Z score and r for the above example? > - How do I get each statistic from the pairwise.wilcox.test call? > > Many Thanks > JP > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From joelr1 at gmail.com Mon May 2 10:18:41 2011 From: joelr1 at gmail.com (Joel Reymont) Date: Mon, 2 May 2011 09:18:41 +0100 Subject: [R] importing and filtering time series data In-Reply-To: References: <13AE38B7-6EE4-41C6-85B5-4256C2875457@gmail.com> Message-ID: My current code looks like this. Anything that can be improved? #! /usr/bin/rscript # install.packages(c('zoo','xts')) library(zoo) library(xts) req_stats <- function(data, type = NA) { if (is.na(type)) csv <- data else # subset of data matching our request type csv <- subset(data, Kind == type) # import into a time series x <- xts(csv$Duration, as.POSIXct(csv$Time)) # requests per second rps <- period.apply(x, endpoints(x, 'seconds'), length) # stats c(length(x), mean(x), var(x), quantile(x, c(.05, .95)), mean(rps)) # indexFormat(x) <- "%Y-%m-%d %H:%M:%OS" # options(digits.secs=6) } # assumes column headers data <- read.csv("benchie.csv") # take out the rows with "N" all <- subset(data, Include == "Y") # Kind: R = sidebar request, C = sidebar click, U = upload doc, A = create ad sidebar_req <- req_stats(all, "R") # sidebar_click <- req_stats(all, "C") doc_upload <- req_stats(all, "U") ad_create <- req_stats(all, "A") all <- req_stats(all) # mdat <- rbind(all, sidebar_req, sidebar_click, doc_upload, ad_create) # rownames(mdat) <- c("all", "sidebar req", "sidebar click", "doc upload", "ad create") mdat <- rbind(all, sidebar_req, doc_upload, ad_create) rownames(mdat) <- c("all", "sidebar req", "doc upload", "ad create") colnames(mdat) <- c("count", "mean", "var", "5%", "95%", "rps") print(round(mdat, digits = 3)) -------------------------------------------------------------------------- - for hire: mac osx device driver ninja, kernel extensions and usb drivers ---------------------+------------+--------------------------------------- http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont ---------------------+------------+--------------------------------------- From andresago1 at hotmail.com Mon May 2 09:55:27 2011 From: andresago1 at hotmail.com (andre bedon) Date: Mon, 2 May 2011 17:55:27 +1000 Subject: [R] vector decreasing by a factor Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Mon May 2 10:29:25 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 10:29:25 +0200 Subject: [R] bootstrapping problem In-Reply-To: <1304060517884-3483068.post@n4.nabble.com> References: <1304060517884-3483068.post@n4.nabble.com> Message-ID: <4DBE6B65.2060100@statistik.tu-dortmund.de> On 29.04.2011 09:01, mimato wrote: > I want to classify bipolar neurons in human cochleas and have data of the > following structure: > > Vol_Nuc Vol_Soma > 1 186.23 731.96 > 2 204.58 4370.96 > 3 539.98 7344.86 > 4 477.71 6939.28 > 5 421.22 5588.53 > 6 276.61 1017.05 > 7 392.28 6392.32 > 8 424.43 6190.13 > 9 256.41 3850.51 > 10 249.17 3118.14 > 11 276.97 3037.29 > 12 295.30 3703.76 > 13 314.43 5265.97 > 14 301.15 5781.73 > > I already worked with Matlab (I?m not a programmer) and created nice > colourcoded dendrograms and also made some verifications of them. I started > now working with R and bootstrapped data with a library named pvclust. It > worked and R computed ... > > here is the code: > > library(pvclust) > > data = > data.frame(Vol_Nuk=c(186.23,204.58,539.98,477.71,421.22,276.61,392.28,424.43,256.41,249.17,276.97,295.3,314.43,301.15), > Vol_Soma=c(731.96,4370.96,7344.86,6939.28,5588.53,1017.05,6392.32,6190.13,3850.51,3118.14,3037.29,3703.76,5265.97,5781.73)) > > plot(data) > result<-pvclust(data,nboot=100) > plot(result) > > It is also not working using following commands: > > cluster.bootstrap<- pvclust(Raw, nboot=1000, method.dist="abscor") > plot(cluster.bootstrap) > pvrect(cluster.bootstrap) > > I always get the following problem: > > mistake in plot.hclust(x$hclust, main = main, sub = sub, xlab = xlab, col = > col, : > invalid input for Dendrogram > > Does anyone has an idea whats wrong... Yes: You have only 2 variables and when clustering these 2 variables it makes no sense to plot this. My I guess you want to cluster observations rather than variables? Then transpose your data before applying pvclust. Best, Uwe Ligges > Thanks a lot!! > > -- > View this message in context: http://r.789695.n4.nabble.com/bootstrapping-problem-tp3483068p3483068.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jabbba at gmail.com Mon May 2 10:41:05 2011 From: jabbba at gmail.com (Marco =?UTF-8?B?QmFyYsOgcmE=?=) Date: Mon, 2 May 2011 10:41:05 +0200 Subject: [R] help with a survplot In-Reply-To: <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> References: <20110430164400.531dfe9a@caprica> <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> Message-ID: <20110502104105.1660c591@caprica> Thank you very much. Despite prof. Harrell's support (for whom I feel great esteem) I still remain doubtful about this feature. From ted.harding at wlandres.net Mon May 2 10:50:59 2011 From: ted.harding at wlandres.net ( (Ted Harding)) Date: Mon, 02 May 2011 09:50:59 +0100 (BST) Subject: [R] vector decreasing by a factor In-Reply-To: Message-ID: On 02-May-11 07:55:27, andre bedon wrote: > Hi, > I'm quite new to R so this question will sound quite fundamental. > I need to create a vector of length 160. The first element should > be (1+r)^159 and each element thereafter should decrease by a > factor of (1+r) until the 160th element that should be 1. > Is there a function similar to seq() but increasing or decreasing > by factors? I need to do this in one step i.e, not using loops. > Any help would be greatly appreciated. > Regards, > Andre One expression which would do what you want is rev((1+r)^(0:159)) though there may be more efficient ways to do it. This assumes that r, hence (1+r), is given. If you are given the value X1 of the first element, which is to be interpreted as (1+r)^159, then perhaps take (1+r) as X1^(1/159), though there is a potential slight inaccuracy in recovering X0 from (1+r)^159. So check this first. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 02-May-11 Time: 09:50:55 ------------------------------ XFMail ------------------------------ From ligges at statistik.tu-dortmund.de Mon May 2 10:51:31 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 10:51:31 +0200 Subject: [R] vector decreasing by a factor In-Reply-To: References: Message-ID: <4DBE7093.5000303@statistik.tu-dortmund.de> On 02.05.2011 09:55, andre bedon wrote: > > Hi, > I'm quite new to R so this question will sound quite fundamental. I need to create a vector of length 160. The first element should be (1+r)^159 and each element thereafter should decrease by a factor of (1+r) until the 160th element that should be 1. Is there a function similar to seq() but increasing or decreasing by factors? I need to do this in one step i.e, not using loops. Any help would be greatly appreciated. Yes: (1+r)^(159:0) Uwe Ligges > Regards, > Andre > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon May 2 10:57:49 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 10:57:49 +0200 Subject: [R] bwplot in ascending order In-Reply-To: References: Message-ID: <4DBE720D.5010901@statistik.tu-dortmund.de> On 01.05.2011 22:52, Doran, Harold wrote: > Can anyone point me to examples with R code where bwplot in lattice is used to order the boxes in ascending order? I have found the following discussion and it partly works. But, I have a conditioning variable, so my example is more like > > bwplot(var1 ~ var2|condition, dat) I guess you are looking for something along bwplot(var1 ~ var2 | reorder(condition, var2, median), dat) Uwe Ligges > > Th example in the discussion below works only when there is not a conditioning variable as far as I can tell. I can tweak the example below to work, but then I get some ugly labels in the lattice plot. It seems index.cond is supposed to help me solve this, but I cannot find good examples showing its use. > > Thanks > Harold > > http://r.789695.n4.nabble.com/bwplot-reorder-factor-on-y-axis-td790903.html > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Bernhard_Pfaff at fra.invesco.com Mon May 2 11:01:17 2011 From: Bernhard_Pfaff at fra.invesco.com (Pfaff, Bernhard Dr.) Date: Mon, 2 May 2011 10:01:17 +0100 Subject: [R] question of VECM restricted regression In-Reply-To: References: Message-ID: Hello Meilan: 'ect' is shorthand for error-correction-term, 'sd' signify seasonal dummy variables and 'LRM.dl1' is the lagged first difference of the variable 'LRM' (the log of real money demand). HTH, Bernhard > -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Im Auftrag von Meilan Yan > Gesendet: Freitag, 29. April 2011 11:10 > An: bernhard.pfaff at pfaffikus.de; r-help at r-project.org > Betreff: [R] question of VECM restricted regression > > Dear Colleague > > I am trying to figure out how to use R to do OLS restricted > VECM regression. However, there are some notation I cannot understand. > > Please tell me what is 'ect', 'sd' and 'LRM.dl1 in the > following practice: > > #OLS retricted VECM regression > data(denmark) > sjd <- denmark[, c("LRM", "LRY", "IBO", "IDE")] > sjd.vecm<- ca.jo(sjd, ecdet = "const", type="eigen", K=2, > spec="longrun", > season=4) > sjd.vecm.rls<-cajorls(sjd.vecm,r=1) > summary(sjd.vecm.rls$rlm) > sjd.vecm.rls$beta > > Response LRM.d : > Call: > lm(formula = substitute(LRM.d), data = data.mat) > > Residuals: > Min 1Q Median 3Q Max > -0.027598 -0.012836 -0.003395 0.015523 0.056034 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > ect1 -0.212955 0.064354 -3.309 0.00185 ** > sd1 -0.057653 0.010269 -5.614 1.16e-06 *** > sd2 -0.016305 0.009177 -1.777 0.08238 . > sd3 -0.040859 0.008767 -4.660 2.82e-05 *** > LRM.dl1 0.049816 0.191992 0.259 0.79646 > LRY.dl1 0.075717 0.157902 0.480 0.63389 > IBO.dl1 -1.148954 0.372745 -3.082 0.00350 ** > IDE.dl1 0.227094 0.546271 0.416 0.67959 > > > sjd.vecm.rls$beta > ect1 > LRM.l2 1.000000 > LRY.l2 -1.032949 > IBO.l2 5.206919 > IDE.l2 -4.215879 > > > Many thanks > Meilan > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ***************************************************************** Confidentiality Note: The information contained in this ...{{dropped:10}} From matevz.pavlic at gi-zrmk.si Mon May 2 11:03:52 2011 From: matevz.pavlic at gi-zrmk.si (=?iso-8859-2?Q?Matev=BE_Pavli=E8?=) Date: Mon, 2 May 2011 11:03:52 +0200 Subject: [R] subseting data Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Mon May 2 11:31:00 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 11:31:00 +0200 Subject: [R] strange fluctuations in system.time with kernapply In-Reply-To: <4DBB2FBB.1040007@physik.hu-berlin.de> References: <1303926914164-3478961.post@n4.nabble.com> <1304103421696-3484371.post@n4.nabble.com> <4DBB2FBB.1040007@physik.hu-berlin.de> Message-ID: <4DBE79D4.7060503@statistik.tu-dortmund.de> On 29.04.2011 23:38, Alexander Senger wrote: > Hello expeRts, > > > here is something which strikes me as kind of odd and I would like to > ask for some enlightenment: > > First let's do this: > > tkern <- kernel("modified.daniell", c(5,5)) > test <- rep(1,1000000) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.100 0.040 1.136 > > That was easy. Now this: > > test <- rep(1,1100000) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.40 0.02 1.43 > > Still fine. Now this: > > test <- rep(1,1110000) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.390 0.020 1.409 > > Ok, by now it seems boring. But wait: > > test <- rep(1,1110300) > system.time(kernapply(test,tkern)) > User System verstrichen > 12.270 0.030 12.319 > > There is a sudden - and repeatable! - jump in the time needed to execute > kernapply. At least from a naive point of view there should not be much > difference between applying a kernel to a vector 1110000 or 1110300 > entries long. But maybe there is some limit here? > > So I tried this: > > test <- rep(1,1110400) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.96 0.01 1.97 > > which doesn't fit into the pattern. But the best thing is still to come. > When I try this > > test <- rep(1,1110308) > system.time(kernapply(test,tkern)) > > then the computer starts to run and does so for longer than 15 minutes > until when I normally kill the process. As noted above this behaviour is > repeatable and occurs every time I issue these commands. > > I really would like to know if there is some magic to the number 1110308 > I'm not aware of. The magic is that the length of the vector, 1110308, is inefficient for the fft() used within kernapply(). You need integer powers of 2 for a really fast FFT. You can also try smaller numbers to get longer runtimes, e.g.: 100003 As an example, compare: system.time(fft(rep(1, 32768))) # roughly 0 seconds system.time(fft(rep(1, 32771))) # almost 10 seconds Uwe Ligges > > > Last but not least, here is my > > sessionInfo() > R version 2.10.1 (2009-12-14) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C > [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 > [5] LC_MONETARY=C LC_MESSAGES=de_DE.utf8 > [7] LC_PAPER=de_DE.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] tools_2.10.1 > > > Thank you, > > Alex > From A.Robinson at ms.unimelb.edu.au Mon May 2 11:37:16 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Mon, 2 May 2011 19:37:16 +1000 Subject: [R] subseting data In-Reply-To: References: Message-ID: <20110502093716.GH48756@ms.unimelb.edu.au> I wonder if grep() will help you? Cheers Andrew On Mon, May 02, 2011 at 11:03:52AM +0200, Matev? Pavli? wrote: > Hi, > > > > Is it possible (i am sure it is) to subset data from a data.frame on the basis of SQL >LIKE< operator. I.e., i would like to subset a data where only values which contains a string >GP< would be used? > > > > Example: > > > > Gp<-subset(DF, DF$USCS like >GP<) > > > > This like of course is not working, > > > > Thanks, m > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From matevz.pavlic at gi-zrmk.si Mon May 2 11:47:43 2011 From: matevz.pavlic at gi-zrmk.si (=?UTF-8?B?TWF0ZXbFviBQYXZsacSN?=) Date: Mon, 2 May 2011 11:47:43 +0200 Subject: [R] subseting data In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Mon May 2 11:52:14 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 11:52:14 +0200 Subject: [R] subseting data In-Reply-To: References: Message-ID: <4DBE7ECE.9040006@statistik.tu-dortmund.de> On 02.05.2011 11:47, Matev? Pavli? wrote: > Hi, > > > > When I use your code i get this : > > > >> dat<-data.frame(test=c("abc","cdf","dabc")) > >> d<-subset(dat,grepl(test,"abc")) d <- subset(dat, grepl("abc", test)) > >> d > > Warning message: > > In grepl(test, "abc") : > > argument 'pattern' has length> 1 and only the first element will be used > >> d > > test > > 1 abc > > 2 cdf > > 3 dabc > > > > I can't seem to make it work. Also how would i use the grepl() to select only those that are not like i.e. ??GP??? Use the negation: d <- subset(dat, !grepl("abc", test)) Uwe Ligges > > > Thanks, m > > > > > > From: Steven Kennedy [mailto:stevenkennedy2263 at gmail.com] > Sent: Monday, May 02, 2011 11:30 AM > To: Matev?? Pavli?? > Cc: r-help at r-project.org > Subject: Re: [R] subseting data > > > > You can use grepl: > >> dat<-data.frame(test=c("abc","cdf","dabc")) >> d<-subset(dat,grepl(test,"abc")) >> d > test > 1 abc > 3 dabc > > > > > On Mon, May 2, 2011 at 7:03 PM, Matev?? Pavli?? wrote: > > Hi, > > > > Is it possible (i am sure it is) to subset data from a data.frame on the basis of SQL>LIKE< operator. I.e., i would like to subset a data where only values which contains a string>GP< would be used? > > > > Example: > > > > Gp<-subset(DF, DF$USCS like>GP<) > > > > This like of course is not working, > > > > Thanks, m > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon May 2 12:02:50 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 12:02:50 +0200 Subject: [R] --mem-vsize in R In-Reply-To: <1304108068999-3484541.post@n4.nabble.com> References: <1304108068999-3484541.post@n4.nabble.com> Message-ID: <4DBE814A.6050306@statistik.tu-dortmund.de> On 29.04.2011 22:14, kparamas wrote: > Hi, > > I am calculation pairwise correlation coefficient for a matrix of 234 X > 30000. > I am getting the following error, > Error in cbind(as.vector(row(cl)), as.vector(col(cl)), as.vector(cl)) : > allocMatrix: too many elements specified The problem is that you try to create a matrix with 3 * nrow(cl) * ncol(cl) elements here. The maximal number of elements in one single vector or matrix is 2^31 - 1. You can have several of those, if you have a sufficient amount of RAM, tough. Uwe Ligges > In addition: There were 50 or more warnings (use warnings() to see the first > 50) > > The function used is, > corGraphPearson = function(cData, COR) #COR is threshold 0.5,0.7, etc > { > > cl = unname(cor(cData, use="pairwise.complete.obs", method="pearson")) > > result = cbind(as.vector(row(cl)),as.vector(col(cl)),as.vector(cl)) > result = result[result[,1] != result[,2],] > > corm = result > > # remove low cor pairs > corm =corm[abs(corm[,3])>= COR, ] > # the network > net<- network(corm, directed = F) > } > > > I am running this in a cluster with 4 machines with 24 GB memory each. > > How should I start R so that I make max use of the memory availbale? > Or how to overcome this issue? > > -- > View this message in context: http://r.789695.n4.nabble.com/mem-vsize-in-R-tp3484541p3484541.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From pburns at pburns.seanet.com Mon May 2 12:14:06 2011 From: pburns at pburns.seanet.com (Patrick Burns) Date: Mon, 02 May 2011 11:14:06 +0100 Subject: [R] The R Inferno revised Message-ID: <4DBE83EE.1070801@pburns.seanet.com> Hell is new and improved. The new version is in the same old place: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf An explanation of the changes is at: http://www.portfolioprobe.com/2011/05/02/the-r-inferno-revised/ -- Patrick Burns pburns at pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') From alaios at yahoo.com Mon May 2 12:35:52 2011 From: alaios at yahoo.com (Alaios) Date: Mon, 2 May 2011 03:35:52 -0700 (PDT) Subject: [R] Bechmarking my code Message-ID: <237869.91926.qm@web120108.mail.ne1.yahoo.com> Dear all, I have written a quite big piece of code that takes like 6 hourse to execute (measured that with system.time). I was wondering if it is possible to try to further understand which are the pieces of code that are more time consuming so to try to improve them. Could you please help me understand how I can do this sort of benchmarking? Best Regards Alex From mxkuhn at gmail.com Mon May 2 12:51:32 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Mon, 2 May 2011 06:51:32 -0400 Subject: [R] caret - prevent resampling when no parameters to find In-Reply-To: <1304299685026-3489091.post@n4.nabble.com> References: <1304287072729-3488761.post@n4.nabble.com> <1304299685026-3489091.post@n4.nabble.com> Message-ID: Yeah, that didn't work. Use fitControl<-trainControl(index = list(seq(along = mdrrClass))) See ?trainControl to understand what this does in detail. Max From ligges at statistik.tu-dortmund.de Mon May 2 13:10:24 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 13:10:24 +0200 Subject: [R] Bechmarking my code In-Reply-To: <237869.91926.qm@web120108.mail.ne1.yahoo.com> References: <237869.91926.qm@web120108.mail.ne1.yahoo.com> Message-ID: <4DBE9120.2030207@statistik.tu-dortmund.de> On 02.05.2011 12:35, Alaios wrote: > Dear all, > I have written a quite big piece of code that takes like 6 hourse to execute (measured that with system.time). > > I was wondering if it is possible to try to further understand which are the pieces of code that are more time consuming so to try to improve them. > > Could you please help me understand how I can do this sort of benchmarking? Use profiling. See ?Rprof and the manuals to get a first idea. Uwe Ligges > Best Regards > Alex > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon May 2 13:13:07 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 13:13:07 +0200 Subject: [R] The bin/R file - hardcoded paths In-Reply-To: References: Message-ID: <4DBE91C3.5050100@statistik.tu-dortmund.de> From the manual R Installation and Administration (that you should have read yourself before posting): "You can install into another directory tree by using make prefix=/path/to/here install" Uwe Ligges On 29.04.2011 19:42, Saptarshi Guha wrote: > Hello, > > I notice that e.g /home/sguha/lib64 is hard coded into the /bin/R file . > I nstalled R as ./configure --prefix=$HOME ... > > What i need to do is ship the entire R distribution to remote nodes, > and run R. These are shipped to ephemeral directories > so I dont know the path ahead of time. > > R_HOME doesn't change things either. > > So i guess one cant run R on a system unless it's been installed? > > 1. I can't install R on the compute nodes using ./configure .... > 2. All nodes do have the same architecture > 3. I would like to stick to the 'shipping' approach. > > > Thanks > Saptarshi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon May 2 13:16:30 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 13:16:30 +0200 Subject: [R] logistic regression with glm: cooks distance and dfbetas are different compared to SPSS output In-Reply-To: <4DBAE774.8050707@charite.de> References: <4DBAE774.8050707@charite.de> Message-ID: <4DBE928E.3040806@statistik.tu-dortmund.de> On 29.04.2011 18:29, "Biedermann, J?rgen" wrote: > Hi there, > > I have the problem, that I'm not able to reproduce the SPSS residual > statistics (dfbeta and cook's distance) with a simple binary logistic > regression model obtained in R via the glm-function. > > I tried the following: > > fit <- glm(y ~ x1 + x2 + x3, data, family=binomial) > > cooks.distance(fit)# Just type stats::cooks.distance.glm and see the definition in R yourself: function (model, infl = influence(model, do.coef = FALSE), res = infl$pear.res, dispersion = summary(model)$dispersion, hat = infl$hat, ...) { p <- model$rank res <- (res/(1 - hat))^2 * hat/(dispersion * p) res[is.infinite(res)] <- NaN res } Now you can digg yourself further on. I do not know how to find the actually used algorithm from SPSS, hence I cannot tell what is different. Uwe Ligges > dfbetas(fit) > > When i compare the returned values with the values that I get in SPSS, > they are different, although the same model is calculated (the > coefficients are the same etc.) > > It seems that different calculation-formulas are used for cooks.distance > and dfbetas in SPSS compared to R. > > Unfortunately I didn't find out, what's the difference in the > calculation and how I could get R to calculate me the same statistics > that SPSS uses. > Or is this an unknown SPSS bug? > > Greetings > J?rgen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon May 2 13:17:40 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 13:17:40 +0200 Subject: [R] Specify custom par(mfrow()) layout for defined plot() In-Reply-To: <4dbad4de.4696cc0a.4bc9.ffffa065@mx.google.com> References: <4dbad4de.4696cc0a.4bc9.ffffa065@mx.google.com> Message-ID: <4DBE92D4.3000301@statistik.tu-dortmund.de> On 29.04.2011 17:10, Michael Bach wrote: > Dear R Users, > > I am doing stats::decompose() on 4 different time series. When I issue > > csdA<- decompose(tsA) > plot(csdA) > > I get a summary plot for observed, trend, seasonal and random components > of decomposed time series tsA. As I understand it, the object returned > by decompose() has it's own plot method where mfrow(4,1) etc. is > defined. Now suppose I wanted to wrap those mfrow(4,1) into my own > mfrow(2,2) layout. How could I achieve this? Is there a general way to > handle these cases? Something like a "meta" par(mfrow())? This does not work and is one of the reasons why the grid package was developed. Uwe Ligges > Best Regards, > Michael Bach > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From tal.galili at gmail.com Mon May 2 13:20:36 2011 From: tal.galili at gmail.com (Tal Galili) Date: Mon, 2 May 2011 14:20:36 +0300 Subject: [R] Tests for the need of cluster analysis In-Reply-To: <1304268859064-3488097.post@n4.nabble.com> References: <1304268859064-3488097.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From oliver.sonnentag at gmail.com Mon May 2 14:25:36 2011 From: oliver.sonnentag at gmail.com (Oliver Sonnentag) Date: Mon, 2 May 2011 08:25:36 -0400 Subject: [R] (no subject) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From hill0093 at umn.edu Mon May 2 14:27:01 2011 From: hill0093 at umn.edu (Hurr) Date: Mon, 2 May 2011 05:27:01 -0700 (PDT) Subject: [R] Copying to R a rectangular array from a Java class In-Reply-To: <1304180854651-3486167.post@n4.nabble.com> References: <1304180854651-3486167.post@n4.nabble.com> Message-ID: <1304339221518-3489899.post@n4.nabble.com> I discovered that a row of a rectangular array returns, but a function parameter is not sent to Java. Appended bare test code: My simple Java test class source and R test code follow: public class RJavTest { public static void main(String[]args) { RJavTest rJavTest=new RJavTest(); } public final static String conStg="testString"; public final static double con0dbl=10000001; public final static double[]con1Arr=new double[] { 10001,10002,10003,10004,10005,10006 }; public final static double[][]con2Arr=new double[][] { { 101,102,103,104 },{ 201,202,203,204 },{ 301,302,303,304 } }; public final static String retConStg() { return(conStg); } public final static double retCon0dbl() { return(con0dbl); } public final static double[] retCon1Arr() { return(con1Arr); } public final static double[] retCon2Row0() { return(con2Arr[0]); } public final static double[] retCon2Row(int row) { return(con2Arr[row]); } public final static double[][] retCon2Arr() { return(con2Arr); } } library(rJava) .jinit() .jaddClassPath("C:/ad/j") # a directory on my disk print(.jclassPath()) rJavaTst <- .jnew("RJavTest") # compiled java to class file connStg <- .jfield(rJavaTst,sig="S","conStg") print(connStg) connStgRet <- .jcall(rJavaTst,returnSig="S","retConStg") print(connStgRet) conn1Arr <- .jfield(rJavaTst,sig="[D","con1Arr") print(conn1Arr) print(conn1Arr[2]) conn1ArrRet <- .jcall(rJavaTst,returnSig="[D","retCon1Arr") print(conn1ArrRet) print(conn1ArrRet[2]) conn0dbl <- .jfield(rJavaTst,sig="D","con0dbl") print(conn0dbl,digits=15) conn2Row0Ret <- .jcall(rJavaTst,returnSig="[D","retCon2Row0") print(conn2Row0Ret) print(conn2Row0Ret[2]) # The above is education, questions on rectangular and parameters are below conn2Arr <- .jfield(rJavaTst,sig="[[D","con2Arr") conn2ArrRet <- .jcall(rJavaTst,returnSig="[[D","retCon2Arr") # I can't identify any complaints so far print(conn2Arr) print(conn2ArrRet) conn2RowRet <- .jcall(rJavaTst,returnSig="[D","retCon2Row",0) print(conn2RowRet) print(conn2RowRet[2]) # But what meaning should I get from these strange messages? The results are: > library(rJava) > .jinit() > .jaddClassPath("C:/ad/j") # a directory on my disk > print(.jclassPath()) [1] "C:\\Users\\ENVY17\\Documents\\R\\win-library\\2.12\\rJava\\java" [2] "C:\\ad\\j" > rJavaTst <- .jnew("RJavTest") # compiled java to class file > connStg <- .jfield(rJavaTst,sig="S","conStg") > print(connStg) [1] "testString" > connStgRet <- .jcall(rJavaTst,returnSig="S","retConStg") > print(connStgRet) [1] "testString" > conn1Arr <- .jfield(rJavaTst,sig="[D","con1Arr") > print(conn1Arr) [1] 10001 10002 10003 10004 10005 10006 > print(conn1Arr[2]) [1] 10002 > conn1ArrRet <- .jcall(rJavaTst,returnSig="[D","retCon1Arr") > print(conn1ArrRet) [1] 10001 10002 10003 10004 10005 10006 > print(conn1ArrRet[2]) [1] 10002 > conn0dbl <- .jfield(rJavaTst,sig="D","con0dbl") > print(conn0dbl,digits=15) [1] 10000001 > conn2Row0Ret <- .jcall(rJavaTst,returnSig="[D","retCon2Row0") > print(conn2Row0Ret) [1] 101 102 103 104 > print(conn2Row0Ret[2]) [1] 102 > # The above is education, questions on rectangular and parameters are > below > conn2Arr <- .jfield(rJavaTst,sig="[[D","con2Arr") > conn2ArrRet <- .jcall(rJavaTst,returnSig="[[D","retCon2Arr") > # I can't identify any complaints so far > print(conn2Arr) [[1]] [1] "Java-Array-Object[D:[D at 66848c" [[2]] [1] "Java-Array-Object[D:[D at 8813f2" [[3]] [1] "Java-Array-Object[D:[D at 1d58aae" > print(conn2ArrRet) [[1]] [1] "Java-Array-Object[D:[D at 66848c" [[2]] [1] "Java-Array-Object[D:[D at 8813f2" [[3]] [1] "Java-Array-Object[D:[D at 1d58aae" > conn2RowRet <- .jcall(rJavaTst,returnSig="[D","retCon2Row",0) Error in .jcall(rJavaTst, returnSig = "[D", "retCon2Row", 0) : method retCon2Row with signature (D)[D not found > print(conn2RowRet) Error in print(conn2RowRet) : object 'conn2RowRet' not found > print(conn2RowRet[2]) Error in print(conn2RowRet[2]) : object 'conn2RowRet' not found > # But what meaning should I get from these strange messages? > -- View this message in context: http://r.789695.n4.nabble.com/Copying-to-R-a-rectangular-array-from-a-Java-class-tp3486167p3489899.html Sent from the R help mailing list archive at Nabble.com. From azamjaafari at yahoo.com Mon May 2 14:42:21 2011 From: azamjaafari at yahoo.com (azam jaafari) Date: Mon, 2 May 2011 05:42:21 -0700 (PDT) Subject: [R] grid Message-ID: <359038.65298.qm@web37102.mail.mud.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From liyatle at gmail.com Mon May 2 10:47:01 2011 From: liyatle at gmail.com (liyatle) Date: Mon, 2 May 2011 01:47:01 -0700 (PDT) Subject: [R] Problem installing new packages In-Reply-To: <4B999494.9030409@earthlink.net> References: <4B998F02.20902@earthlink.net> <4B999494.9030409@earthlink.net> Message-ID: <1304326021441-3489573.post@n4.nabble.com> I'm having that problem, what did you do? -- View this message in context: http://r.789695.n4.nabble.com/Problem-installing-new-packages-tp1589974p3489573.html Sent from the R help mailing list archive at Nabble.com. From mark_difford at yahoo.co.uk Mon May 2 10:32:16 2011 From: mark_difford at yahoo.co.uk (Mark Difford) Date: Mon, 2 May 2011 01:32:16 -0700 (PDT) Subject: [R] bwplot in ascending order In-Reply-To: References: Message-ID: <1304325136412-3489544.post@n4.nabble.com> On May 01 (2011) Harold Doran wrote: >> Can anyone point me to examples with R code where bwplot in lattice is >> used to order the boxes in >> ascending order? You don't give an example and what you want is not entirely clear. Presumably you want ordering by the median (boxplot, and based on the example you point to, where the median is mentioned as an _example_). Is this what you want? ## bwplot(var1 ~ var2|condition, dat, index.cond = function(x, y) reorder(y, x, median)) ## if x is numeric bwplot(var1 ~ var2|condition, dat, index.cond = function(x, y) reorder(x, y, median)) ## if y is numeric Regards, Mark. -- View this message in context: http://r.789695.n4.nabble.com/bwplot-in-ascending-order-tp3488557p3489544.html Sent from the R help mailing list archive at Nabble.com. From stevenkennedy2263 at gmail.com Mon May 2 11:29:53 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Mon, 2 May 2011 19:29:53 +1000 Subject: [R] subseting data In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ngokangmin at gmail.com Mon May 2 11:48:45 2011 From: ngokangmin at gmail.com (Kang Min) Date: Mon, 2 May 2011 02:48:45 -0700 (PDT) Subject: [R] Axis label colour Message-ID: <2657440c-5e05-4322-8a53-14769701815b@d19g2000prh.googlegroups.com> Hi all, Is there an argument in the axis() function to change the colour of the tick labels? I only found col.ticks, and col.lab, but they're not doing what I want. Thanks, KM From bhagwataditya at gmail.com Mon May 2 13:19:10 2011 From: bhagwataditya at gmail.com (abhagwat) Date: Mon, 2 May 2011 04:19:10 -0700 (PDT) Subject: [R] Global variables In-Reply-To: <4D263B2C.1070208@gmail.com> References: <4D2637E2.8030409@cognigencorp.com> <4D263B2C.1070208@gmail.com> Message-ID: <1304335150343-3489796.post@n4.nabble.com> Well, what would be really helpful is to restrict the scope of all non-function variables, but keep a global for scope of all function variables. Then, you still have access to all loaded functions, but you don't mix up variables. How would one do that? Adi > Is there a way I can prevent global variables to be visible within my > functions? Yes, but you probably shouldn't. You would do it by setting the environment of the function to something that doesn't have the global environment as a parent, or grandparent, etc. The only common examples of that are baseenv() and emptyenv(). For example, x <- 1 f <- function() print(x) -- View this message in context: http://r.789695.n4.nabble.com/Global-variables-tp3178242p3489796.html Sent from the R help mailing list archive at Nabble.com. From g.schumacher at vu.nl Mon May 2 13:54:04 2011 From: g.schumacher at vu.nl (Schumacher, G.) Date: Mon, 2 May 2011 11:54:04 +0000 Subject: [R] how to get row name using the which function Message-ID: <3BD4FCB21E88D84C8A7264CF81B0F3110AC81B40@PEXMB001A.vu.local> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From kris4ya at gmail.com Mon May 2 13:58:26 2011 From: kris4ya at gmail.com (kcchalmers) Date: Mon, 2 May 2011 04:58:26 -0700 (PDT) Subject: [R] adding legend to matplot Message-ID: <1304337506103-3489844.post@n4.nabble.com> Hi all, I am new to R programming and I was trying to write a simple code in order to plot my data. The problem is that I am not able to insert a legend corresponding to each column of the data matrix. Please can someone help me out. How can i directly get the legend relating to each data curve. This is the code I had written: matrix<-read.table(file="PNPLA.txt", header=TRUE, sep="\t", row.names=1 ) a<-matrix[1:6] c1<-a[,1]/a[28,1] c2<-a[,2]/a[28,2] c3<-a[,3]/a[28,3] c4<-a[,4]/a[28,4] c5<-a[,5]/a[28,5] c6<-a[,6]/a[28,6] mat<-cbind(c1,c2,c3,c4,c5,c6) x1<-mat[,1]/a[,1] x2<-mat[,2]/a[,1] x3<-mat[,3]/a[,1] x4<-mat[,4]/a[,1] x5<-mat[,5]/a[,1] x6<-mat[,6]/a[,1] final<-cbind(x1,x2,x3,x4,x5,x6) matplot(final,type="l") tfin<-t(final) colnames(tfin)<-c("PC26","PC28","PC28:02","PC30","PC32","PC3201","PC3202","PC34", "PC3401","PC3402","PC3404","PC36","PC3601","PC3602","PC3604","PC3606","PC38","PC3804","PC3806","PC40","PC4002","PC4006","PC4008","PC42","PC44","SM21") matplot(tfin,pch = 1:25, type = "o",lty=20,lwd=1.9,xlab="Time",ylab="PCmix/SM21:00 ratio") -- View this message in context: http://r.789695.n4.nabble.com/adding-legend-to-matplot-tp3489844p3489844.html Sent from the R help mailing list archive at Nabble.com. From phaebz at gmail.com Mon May 2 14:54:56 2011 From: phaebz at gmail.com (Michael Bach) Date: Mon, 02 May 2011 15:54:56 +0300 Subject: [R] Specify custom par(mfrow()) layout for defined plot() In-Reply-To: <4DBE92D4.3000301@statistik.tu-dortmund.de> (Uwe Ligges's message of "Mon, 02 May 2011 13:17:40 +0200") References: <4dbad4de.4696cc0a.4bc9.ffffa065@mx.google.com> <4DBE92D4.3000301@statistik.tu-dortmund.de> Message-ID: <4dbea9a3.90870e0a.258b.ffffdc5e@mx.google.com> Uwe Ligges writes: > On 29.04.2011 17:10, Michael Bach wrote: >> Dear R Users, >> >> I am doing stats::decompose() on 4 different time series. When I issue >> >> csdA<- decompose(tsA) >> plot(csdA) >> >> I get a summary plot for observed, trend, seasonal and random components >> of decomposed time series tsA. As I understand it, the object returned >> by decompose() has it's own plot method where mfrow(4,1) etc. is >> defined. Now suppose I wanted to wrap those mfrow(4,1) into my own >> mfrow(2,2) layout. How could I achieve this? Is there a general way to >> handle these cases? Something like a "meta" par(mfrow())? > > > This does not work and is one of the reasons why the grid package was developed. > Does this mean that there is no way whatsoever or that there is a workaround via the grid package?? Kind Regards, Michael Bach From HDoran at air.org Mon May 2 15:01:26 2011 From: HDoran at air.org (Doran, Harold) Date: Mon, 2 May 2011 09:01:26 -0400 Subject: [R] bwplot in ascending order In-Reply-To: <4DBE720D.5010901@statistik.tu-dortmund.de> References: <4DBE720D.5010901@statistik.tu-dortmund.de> Message-ID: Doesn't seem to work. My data structure is below (I will send data to anyone off-list who could offer support). The following code below does work, but since I concatenate Region and Gender, the labels on the lattice are ugly. dat$test <- factor(paste(dat$Region, dat$Gender, sep='_')) bymedian <- with(dat, reorder(test, finalRank, median)) bwplot(reorder(test, finalRank, median) ~ finalRank|Gender, dat, subset = Region !="", scale='free', xlab = 'Total Score', ylab = 'Region', ) > str(dat) 'data.frame': 58921 obs. of 16 variables: $ Athlete : int 13 13 13 13 13 14 14 15 15 15 ... $ Workout : Factor w/ 6 levels "11.1","11.2",..: 1 2 3 4 5 1 2 1 1 2 ... $ Result : int 309 375 46 100 300 158 232 353 359 479 ... $ Valid : Factor w/ 6 levels "bogus","invalid",..: 5 5 5 5 5 5 5 5 5 5 ... $ Gender : Factor w/ 2 levels "female","male": 2 2 2 2 2 2 2 2 2 2 ... $ Height.cm.: num 196 196 196 196 196 ... $ Weight.kg.: num 97.7 97.7 97.7 97.7 97.7 ... $ Age : int 29 29 29 29 29 42 42 24 24 24 ... $ Region : Factor w/ 18 levels "","Africa","Asia",..: 16 16 16 16 16 13 13 18 18 18 ... $ AgeCut : num 2 2 2 2 2 4 4 2 2 2 ... $ Height.met: num 1.96 1.96 1.96 1.96 1.96 ... $ spVar : chr "11.1_male" "11.2_male" "11.3_male" "11.4_male" ... $ Rank : int 1567 2253 2050 1651 1462 8155 7624 322 208 206 ... $ totalRank : int [1:58921(1d)] 8983 8983 8983 8983 8983 15779 15779 1252 1252 1252 ... ..- attr(*, "dimnames")=List of 1 .. ..$ : chr "13" "13" "13" "13" ... $ finalRank : int 1274 1274 1274 1274 1274 2643 2643 81 81 81 ... $ totalScore: int [1:58921(1d)] 1130 1130 1130 1130 1130 390 390 1768 1768 1768 ... ..- attr(*, "dimnames")=List of 1 .. ..$ : chr "13" "13" "13" "13" ... > -----Original Message----- > From: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] > Sent: Monday, May 02, 2011 4:58 AM > To: Doran, Harold > Cc: r-help at r-project.org > Subject: Re: [R] bwplot in ascending order > > > > On 01.05.2011 22:52, Doran, Harold wrote: > > Can anyone point me to examples with R code where bwplot in lattice is used > to order the boxes in ascending order? I have found the following discussion > and it partly works. But, I have a conditioning variable, so my example is > more like > > > > bwplot(var1 ~ var2|condition, dat) > > > I guess you are looking for something along > > bwplot(var1 ~ var2 | reorder(condition, var2, median), dat) > > Uwe Ligges > > > > > > Th example in the discussion below works only when there is not a > conditioning variable as far as I can tell. I can tweak the example below to > work, but then I get some ugly labels in the lattice plot. It seems index.cond > is supposed to help me solve this, but I cannot find good examples showing its > use. > > > > Thanks > > Harold > > > > http://r.789695.n4.nabble.com/bwplot-reorder-factor-on-y-axis-td790903.html > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. From PDowney at urban.org Mon May 2 15:04:57 2011 From: PDowney at urban.org (Downey, Patrick) Date: Mon, 2 May 2011 09:04:57 -0400 Subject: [R] how to get row name using the which function Message-ID: <0F96478603980B46AAAFBA77069582ED131C756A@UIEXCH.urban.org> Perhaps not the most elegant. rownames(example)[which.max(example)] If you wanted to type less, you could always write a function. names.max <- function(x){ return(rownames(example)[which.max(example)]) } -Mitch -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Schumacher, G. Sent: Monday, May 02, 2011 7:54 AM To: 'r-help at r-project.org' Subject: [R] how to get row name using the which function Dear All, Probably a very basic question, but can't seem to work my way around it. I want to which row has the maximum value. But what if the row names do not correspond with the row numbers. In the example below, you'll see that the max of example is row 4, but the name of row 4 is "9". How do I get R to return "9" as value, instead of 4. example <- matrix(c(0,0,0,1), 4, 1, dimnames=list(c("1", "3", "5", "9"), c("1"))) which.max(example) [1] 4 Hope someone can help out. Gijs Schumacher, MSc PhD candidate -------------------------------------- Department of Political Science VU University Amsterdam Contact: Tel: +31(0)20 5986798 Fax: +31(0)20 5986820 Web: http://home.fsw.vu.nl/g.schumacher Email: g.schumacher at vu.nl Visiting address: Metropolitan Buitenveldertselaan 2 Room Z - 333 Mail: De Boelelaan 1081 1081 HV Amsterdam The Netherlands [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From ripley at stats.ox.ac.uk Mon May 2 15:32:16 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Mon, 2 May 2011 14:32:16 +0100 (BST) Subject: [R] Specify custom par(mfrow()) layout for defined plot() In-Reply-To: <4dbea9a3.90870e0a.258b.ffffdc5e@mx.google.com> References: <4dbad4de.4696cc0a.4bc9.ffffa065@mx.google.com> <4DBE92D4.3000301@statistik.tu-dortmund.de> <4dbea9a3.90870e0a.258b.ffffdc5e@mx.google.com> Message-ID: On Mon, 2 May 2011, Michael Bach wrote: > Uwe Ligges writes: > >> On 29.04.2011 17:10, Michael Bach wrote: >>> Dear R Users, >>> >>> I am doing stats::decompose() on 4 different time series. When I issue >>> >>> csdA<- decompose(tsA) >>> plot(csdA) >>> >>> I get a summary plot for observed, trend, seasonal and random components >>> of decomposed time series tsA. As I understand it, the object returned >>> by decompose() has it's own plot method where mfrow(4,1) etc. is >>> defined. Now suppose I wanted to wrap those mfrow(4,1) into my own >>> mfrow(2,2) layout. How could I achieve this? Is there a general way to >>> handle these cases? Something like a "meta" par(mfrow())? >> >> >> This does not work and is one of the reasons why the grid package was developed. >> > > Does this mean that there is no way whatsoever or that there is a > workaround via the grid package?? See the gridBase package. > > Kind Regards, > Michael Bach > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From frank.lehmann62 at freenet.de Mon May 2 14:51:50 2011 From: frank.lehmann62 at freenet.de (Frank Lehmann) Date: Mon, 2 May 2011 14:51:50 +0200 Subject: [R] problem with Sweave and pdflatex Message-ID: <000301cc08c7$b4c2ef50$1e48cdf0$@lehmann62@freenet.de> Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt. Name: nicht verf?gbar URL: From Stefan.Hoj-Edwards at agrsci.dk Mon May 2 15:12:27 2011 From: Stefan.Hoj-Edwards at agrsci.dk (=?iso-8859-1?Q?Stefan_McKinnon_H=F8j-Edwards?=) Date: Mon, 2 May 2011 15:12:27 +0200 Subject: [R] Problems with Rterm 2.13.0 - but not RGui Message-ID: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> Hi all, I have just installed R 2.13.0 and I am experiencing problems with the terminal, but not the with the GUI interface. I am Windows 7. When running "R" or "Rterm" from a commandline I receive the following: Warning message: In normalizePath(path.expand(path), winslash, mustWork) : path[3]="C:/Programmer/R/R-2.13.0/library": Adgang n?gtet R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-pc-mingw32/i386 (32-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. Warning message: package "methods" in options("defaultPackages") was not found During startup - Warning messages: 1: package 'datasets' in options("defaultPackages") was not found 2: package 'utils' in options("defaultPackages") was not found 3: package 'grDevices' in options("defaultPackages") was not found 4: package 'graphics' in options("defaultPackages") was not found 5: package 'stats' in options("defaultPackages") was not found 6: package 'methods' in options("defaultPackages") was not found Notice: "C:/Programmer/" is the Danish equivalent of "C:/Program Files". The first error "Adgang n?gtet" is directly translated to "Access denied". Any suggestions as how to fix this? Kind regards, Stefan McKinnon Edwards From rvaradhan at jhmi.edu Mon May 2 15:41:28 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Mon, 2 May 2011 09:41:28 -0400 Subject: [R] strange fluctuations in system.time with kernapply In-Reply-To: <4DBE79D4.7060503@statistik.tu-dortmund.de> References: <1303926914164-3478961.post@n4.nabble.com> <1304103421696-3484371.post@n4.nabble.com> <4DBB2FBB.1040007@physik.hu-berlin.de> <4DBE79D4.7060503@statistik.tu-dortmund.de> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669FAA4C0@WIGGUMVS.win.ad.jhu.edu> Why not do `zero padding' to improve the efficiency, i.e. add a bunch of zeros to the end of the data vector such that the resulting vector is a power of 2? This is very common in signal processing, and is legitimate since zero padding does not add any new information. Ravi. ------------------------------------------------------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvaradhan at jhmi.edu -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Uwe Ligges Sent: Monday, May 02, 2011 5:31 AM To: Alexander Senger Cc: r-help at r-project.org Subject: Re: [R] strange fluctuations in system.time with kernapply On 29.04.2011 23:38, Alexander Senger wrote: > Hello expeRts, > > > here is something which strikes me as kind of odd and I would like to > ask for some enlightenment: > > First let's do this: > > tkern <- kernel("modified.daniell", c(5,5)) > test <- rep(1,1000000) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.100 0.040 1.136 > > That was easy. Now this: > > test <- rep(1,1100000) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.40 0.02 1.43 > > Still fine. Now this: > > test <- rep(1,1110000) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.390 0.020 1.409 > > Ok, by now it seems boring. But wait: > > test <- rep(1,1110300) > system.time(kernapply(test,tkern)) > User System verstrichen > 12.270 0.030 12.319 > > There is a sudden - and repeatable! - jump in the time needed to execute > kernapply. At least from a naive point of view there should not be much > difference between applying a kernel to a vector 1110000 or 1110300 > entries long. But maybe there is some limit here? > > So I tried this: > > test <- rep(1,1110400) > system.time(kernapply(test,tkern)) > User System verstrichen > 1.96 0.01 1.97 > > which doesn't fit into the pattern. But the best thing is still to come. > When I try this > > test <- rep(1,1110308) > system.time(kernapply(test,tkern)) > > then the computer starts to run and does so for longer than 15 minutes > until when I normally kill the process. As noted above this behaviour is > repeatable and occurs every time I issue these commands. > > I really would like to know if there is some magic to the number 1110308 > I'm not aware of. The magic is that the length of the vector, 1110308, is inefficient for the fft() used within kernapply(). You need integer powers of 2 for a really fast FFT. You can also try smaller numbers to get longer runtimes, e.g.: 100003 As an example, compare: system.time(fft(rep(1, 32768))) # roughly 0 seconds system.time(fft(rep(1, 32771))) # almost 10 seconds Uwe Ligges > > > Last but not least, here is my > > sessionInfo() > R version 2.10.1 (2009-12-14) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=de_DE.utf8 LC_NUMERIC=C > [3] LC_TIME=de_DE.utf8 LC_COLLATE=de_DE.utf8 > [5] LC_MONETARY=C LC_MESSAGES=de_DE.utf8 > [7] LC_PAPER=de_DE.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=de_DE.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] tools_2.10.1 > > > Thank you, > > Alex > ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From murdoch.duncan at gmail.com Mon May 2 15:43:32 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 02 May 2011 09:43:32 -0400 Subject: [R] Global variables In-Reply-To: <1304335150343-3489796.post@n4.nabble.com> References: <4D2637E2.8030409@cognigencorp.com> <4D263B2C.1070208@gmail.com> <1304335150343-3489796.post@n4.nabble.com> Message-ID: <4DBEB504.6040109@gmail.com> On 02/05/2011 7:19 AM, abhagwat wrote: > Well, what would be really helpful is to restrict the scope of all > non-function variables, but keep a global for scope of all function > variables. Then, you still have access to all loaded functions, but you > don't mix up variables. > > How would one do that? You can't without low level modifications. Before R has done the lookup, it doesn't know if an object is a function or not. It can guess by usage, e.g. it can recognize that "print" should be a function in print(1) and it will ignore non-functions named "print", but it is very common in R code to do things like fn <- print fn(1) and that would fail. But if you want to experiment with the change, you can, because R is open source. I doubt if you'll get much help unless you give a really convincing argument (on the R-devel list, not on this list) why to make the change. Duncan Murdoch > Adi > > > > Is there a way I can prevent global variables to be visible within my > > functions? > > Yes, but you probably shouldn't. You would do it by setting the > environment of the function to something that doesn't have the global > environment as a parent, or grandparent, etc. The only common examples > of that are baseenv() and emptyenv(). For example, > > x<- 1 > f<- function() print(x) > > > -- > View this message in context: http://r.789695.n4.nabble.com/Global-variables-tp3178242p3489796.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Mon May 2 15:49:46 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 2 May 2011 06:49:46 -0700 Subject: [R] Axis label colour In-Reply-To: <2657440c-5e05-4322-8a53-14769701815b@d19g2000prh.googlegroups.com> References: <2657440c-5e05-4322-8a53-14769701815b@d19g2000prh.googlegroups.com> Message-ID: On May 2, 2011, at 2:48 AM, Kang Min wrote: > Hi all, > > Is there an argument in the axis() function to change the colour of > the tick labels? I only found col.ticks, and col.lab, but they're not > doing what I want. You just need to read a bit further down in the help page for `axis`. -- David Winsemius, MD Heritage Laboratories West Hartford, CT From lebatsnok at gmail.com Mon May 2 16:02:00 2011 From: lebatsnok at gmail.com (Kenn Konstabel) Date: Mon, 2 May 2011 17:02:00 +0300 Subject: [R] Global variables In-Reply-To: <1304335150343-3489796.post@n4.nabble.com> References: <4D2637E2.8030409@cognigencorp.com> <4D263B2C.1070208@gmail.com> <1304335150343-3489796.post@n4.nabble.com> Message-ID: On Mon, May 2, 2011 at 2:19 PM, abhagwat wrote: > Well, what would be really helpful is to restrict the scope of all > non-function variables, but keep a global for scope of all function > variables. Then, you still have access to all loaded functions, but you > don't mix up variables. > > How would one do that? But what's the real motivation for this? It could be useful for ensuring that there are no unexpected global variables in your code but you can do it using findGlobals in codetools package. fun <- function() mean(x) findGlobals(fun, merge=FALSE) Kenn >> Is there a way I can prevent global variables to be visible within my >> functions? > > Yes, but you probably shouldn't. ?You would do it by setting the > environment of the function to something that doesn't have the global > environment as a parent, or grandparent, etc. ?The only common examples > of that are baseenv() and emptyenv(). ?For example, > > x <- 1 > f <- function() print(x) > > > -- > View this message in context: http://r.789695.n4.nabble.com/Global-variables-tp3178242p3489796.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From mathias.walter at googlemail.com Mon May 2 16:03:45 2011 From: mathias.walter at googlemail.com (Mathias Walter) Date: Mon, 2 May 2011 16:03:45 +0200 Subject: [R] 3-way contingency table In-Reply-To: <1CD70FCD-B971-48CE-B357-8B2EAF58A301@comcast.net> References: <1CD70FCD-B971-48CE-B357-8B2EAF58A301@comcast.net> Message-ID: Hi David, thanks for your quick response. It was really helpful. -- Kind regards, Mathias 2011/4/29 David Winsemius : > > On Apr 29, 2011, at 6:47 AM, Mathias Walter wrote: > >> Hi, >> >> I have large data frame with many columns. A short example is given below: >> >>> dataH >> >> ? host ms01 ms31 ms33 ms34 >> 1 ?cattle ? ?4 ? 20 ? ?9 ? ?6 >> 2 ? sheep ? ?4 ? ?3 ? ?4 ? ?5 >> 3 ?cattle ? ?4 ? ?3 ? ?4 ? ?5 >> 4 ?cattle ? ?4 ? ?3 ? ?4 ? ?5 >> 5 ? sheep ? ?4 ? ?3 ? ?5 ? ?5 >> 6 ? ?goat ? ?4 ? ?3 ? ?4 ? ?5 >> 7 ? sheep ? ?4 ? ?3 ? ?5 ? ?5 >> 8 ? ?goat ? ?4 ? ?3 ? ?4 ? ?5 >> 9 ? ?goat ? ?4 ? ?3 ? ?4 ? ?5 >> 10 cattle ? ?4 ? ?3 ? ?4 ? ?5 >> >> Now I want to determine the the frequencies of every unique value in >> every column depending on the host column. >> >> It is quite easy to determine the frequencies in total with the >> following command: >> >>> dataH2 <- dataH[,c(2,3,4,5)] >>> table(as.matrix(dataH2), colnames(dataH2)[col(dataH2)], useNA="ifany") >> >> ? ms01 ms31 ms33 ms34 >> 3 ? ? 0 ? ?9 ? ?0 ? ?0 >> 4 ? ?10 ? ?0 ? ?7 ? ?0 >> 5 ? ? 0 ? ?0 ? ?2 ? ?9 >> 6 ? ? 0 ? ?0 ? ?0 ? ?1 >> 9 ? ? 0 ? ?0 ? ?1 ? ?0 >> 20 ? ?0 ? ?1 ? ?0 ? ?0 >> >> But I cannot manage to get it dependent on the host. >> >> I tried >> >>> xtabs(cbind(ms01, ms31, ms33, ms34) ~ ., dataH) >> >> and many other ways but I'm not stressful. >> >> I can get it for each column individually with >> >>> with(dataH, table(host, ms33)) >> >> ? ? ?ms33 >> host ? ? 4 5 9 >> cattle 3 0 1 >> deer ? 0 0 0 >> goat ? 3 0 0 >> human ?0 0 0 >> sheep ?1 2 0 >> tick ? 0 0 0 >> >> But I do not want to repeat the command for every column. I need a >> single table which can be plotted as a balloon plot, for instance. > > You have obviously not given us the full data from which your "correct > answer" was drawn, but see if this is going ?the right direction: > > require(reshape) >> dataHm <- melt(dataH) > Using host as id variables >> xtabs(~host+value+variable, dataHm) > , , variable = ms01 > > ? ? ? ?value > host ? ? 3 4 5 6 9 20 > ?cattle 0 4 0 0 0 ?0 > ?goat ? 0 3 0 0 0 ?0 > ?sheep ?0 3 0 0 0 ?0 > > , , variable = ms31 > > ? ? ? ?value > host ? ? 3 4 5 6 9 20 > ?cattle 3 0 0 0 0 ?1 > ?goat ? 3 0 0 0 0 ?0 > ?sheep ?3 0 0 0 0 ?0 > > , , variable = ms33 > > ? ? ? ?value > host ? ? 3 4 5 6 9 20 > ?cattle 0 3 0 0 1 ?0 > ?goat ? 0 3 0 0 0 ?0 > ?sheep ?0 1 2 0 0 ?0 > > , , variable = ms34 > > ? ? ? ?value > host ? ? 3 4 5 6 9 20 > ?cattle 0 0 3 1 0 ?0 > ?goat ? 0 0 3 0 0 ?0 > ?sheep ?0 0 3 0 0 ?0 > >> >> Does anybody knows how to achieve this? >> >> -- >> Kind regards, >> Mathias >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > From mathias.walter at gmx.net Mon May 2 16:06:38 2011 From: mathias.walter at gmx.net (Mathias Walter) Date: Mon, 2 May 2011 16:06:38 +0200 Subject: [R] pie of pie chart Message-ID: Hi, despite the fact that pie charts often fail, I'll draw them anyway (in a case were they are not fail ;-) ). Does anybody know a package/methods which can draw pie of pie or bar of pie charts similar to that in MS Excel? -- Kind regards, Mathias From biomathjdaily at gmail.com Mon May 2 16:28:22 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Mon, 2 May 2011 10:28:22 -0400 Subject: [R] pie of pie chart In-Reply-To: References: Message-ID: The package ggplot2 can do this using a density statistic, polar coordinates, and faceting. Extra documentation for the package can be found at the author's site [1]. [1] http://had.co.nz/ On Mon, May 2, 2011 at 10:06 AM, Mathias Walter wrote: > Hi, > > despite the fact that pie charts often fail, I'll draw them anyway (in > a case were they are not fail ;-) ). > > Does anybody know a package/methods which can draw pie of pie or bar > of pie charts similar to that in MS Excel? > > -- > Kind regards, > Mathias > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From f.harrell at vanderbilt.edu Mon May 2 16:30:10 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Mon, 2 May 2011 07:30:10 -0700 (PDT) Subject: [R] help with a survplot In-Reply-To: <20110502104105.1660c591@caprica> References: <20110430164400.531dfe9a@caprica> <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> <20110502104105.1660c591@caprica> Message-ID: <1304346610371-3490126.post@n4.nabble.com> Please elaborate. Thanks Frank Marco Barb?ra-2 wrote: > > Thank you very much. > > Despite prof. Harrell's support (for whom I feel great > esteem) I still remain doubtful about this feature. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/help-with-a-survplot-tp3485998p3490126.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Mon May 2 16:51:44 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 16:51:44 +0200 Subject: [R] Problem installing new packages In-Reply-To: <1304326021441-3489573.post@n4.nabble.com> References: <4B998F02.20902@earthlink.net> <4B999494.9030409@earthlink.net> <1304326021441-3489573.post@n4.nabble.com> Message-ID: <4DBEC500.1030804@statistik.tu-dortmund.de> On 02.05.2011 10:47, liyatle wrote: > I'm having that problem, what did you do? 1. This is the mailing list R-help, not an individual person. I guess you sent to the wrong address. 2. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. 3. Please always quote the questions/answers you are referring to. 4. The post your are referring to is ancient. I guess you either just downloaded the files or you do not have write permissions to the library or you install do a different library than the one you expect. But since you have not given any details, we cannot help. Uwe Ligges > > -- > View this message in context: http://r.789695.n4.nabble.com/Problem-installing-new-packages-tp1589974p3489573.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From biomathjdaily at gmail.com Mon May 2 16:58:56 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Mon, 2 May 2011 10:58:56 -0400 Subject: [R] Problems with Rterm 2.13.0 - but not RGui In-Reply-To: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> References: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> Message-ID: The message is pretty clear. Access denied means you don't have permission to access the path. This also explains why the packages fail to load - you don't have access to R's package library. It most likely works on RGui because you are clicking it/running it as admin (you did not specify how you ran RGui). 2011/5/2 Stefan McKinnon H?j-Edwards : > Hi all, > > I have just installed R 2.13.0 and I am experiencing problems with the terminal, but not the with the GUI interface. > I am Windows 7. > > When running "R" or "Rterm" from a commandline I receive the following: > > Warning message: > In normalizePath(path.expand(path), winslash, mustWork) : > ?path[3]="C:/Programmer/R/R-2.13.0/library": Adgang n?gtet > > R version 2.13.0 (2011-04-13) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: i386-pc-mingw32/i386 (32-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > Warning message: > package "methods" in options("defaultPackages") was not found > During startup - Warning messages: > 1: package 'datasets' in options("defaultPackages") was not found > 2: package 'utils' in options("defaultPackages") was not found > 3: package 'grDevices' in options("defaultPackages") was not found > 4: package 'graphics' in options("defaultPackages") was not found > 5: package 'stats' in options("defaultPackages") was not found > 6: package 'methods' in options("defaultPackages") was not found > > > Notice: "C:/Programmer/" is the Danish equivalent of "C:/Program Files". > The first error "Adgang n?gtet" is directly translated to "Access denied". > > Any suggestions as how to fix this? > > Kind regards, > Stefan McKinnon Edwards > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From ligges at statistik.tu-dortmund.de Mon May 2 17:51:52 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 17:51:52 +0200 Subject: [R] problem with Sweave and pdflatex In-Reply-To: <000301cc08c7$b4c2ef50$1e48cdf0$@lehmann62@freenet.de> References: <000301cc08c7$b4c2ef50$1e48cdf0$@lehmann62@freenet.de> Message-ID: <4DBED318.7070808@statistik.tu-dortmund.de> Have you checked the permissions in the working directory? Is there a blank in your path (LaTeX does not like spaces in the path). Uwe Ligges On 02.05.2011 14:51, Frank Lehmann wrote: > Hallo, > > > > when I plot figures with Sweave, I get the message "pdflatex: Permission > denied". This problem only occurs while working on local system. When I copy > the *.rnw-File to my AFS drive, there is no problem at all. > > > > Here is a small example: > > > > \documentclass{scrartcl} > > \usepackage[OT1]{fontenc} > > \usepackage[latin1]{inputenc} > > \usepackage[ngerman]{babel} > > \usepackage[pdftex]{graphicx} > > \usepackage{Sweave} > > > > \begin{document} > > > > \setkeys{Gin}{width=\textwidth} > > \begin{figure}[htbp] > > <>= > > x<- 1:10 > > plot(x) > > @ > > \caption{Eine einfache Grafik} > > \end{figure} > > > > \end{document} > > > > Does anyone have an idea, how to solve that problem? Im working with Windows > XP. > > > > Thanks! > > > > Frank > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Mon May 2 17:56:55 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 02 May 2011 17:56:55 +0200 Subject: [R] how to get row name using the which function In-Reply-To: <3BD4FCB21E88D84C8A7264CF81B0F3110AC81B40@PEXMB001A.vu.local> References: <3BD4FCB21E88D84C8A7264CF81B0F3110AC81B40@PEXMB001A.vu.local> Message-ID: <4DBED447.1090100@statistik.tu-dortmund.de> rownames(which(example == max(example), arr.ind=TRUE)) Uwe Ligges On 02.05.2011 13:54, Schumacher, G. wrote: > Dear All, > > Probably a very basic question, but can't seem to work my way around it. > > I want to which row has the maximum value. But what if the row names do not correspond with the row numbers. In the example below, you'll see that the max of example is row 4, but the name of row 4 is "9". How do I get R to return "9" as value, instead of 4. > > example<- matrix(c(0,0,0,1), 4, 1, dimnames=list(c("1", "3", "5", "9"), c("1"))) > which.max(example) > > [1] 4 > > Hope someone can help out. > > Gijs Schumacher, MSc > PhD candidate > > -------------------------------------- > Department of Political Science > VU University Amsterdam > > Contact: > Tel: +31(0)20 5986798 > Fax: +31(0)20 5986820 > Web: http://home.fsw.vu.nl/g.schumacher > Email: g.schumacher at vu.nl > > Visiting address: > Metropolitan > Buitenveldertselaan 2 > Room Z - 333 > > Mail: > De Boelelaan 1081 > 1081 HV Amsterdam > The Netherlands > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From phaebz at gmail.com Mon May 2 18:17:22 2011 From: phaebz at gmail.com (Michael Bach) Date: Mon, 02 May 2011 19:17:22 +0300 Subject: [R] Specify custom par(mfrow()) layout for defined plot() In-Reply-To: (Brian Ripley's message of "Mon, 2 May 2011 14:32:16 +0100 (BST)") References: <4dbad4de.4696cc0a.4bc9.ffffa065@mx.google.com> <4DBE92D4.3000301@statistik.tu-dortmund.de> <4dbea9a3.90870e0a.258b.ffffdc5e@mx.google.com> Message-ID: <4dbed916.d94fcc0a.2985.267a@mx.google.com> Prof Brian Ripley writes: > On Mon, 2 May 2011, Michael Bach wrote: > >> Uwe Ligges writes: >> >>> On 29.04.2011 17:10, Michael Bach wrote: >>>> Dear R Users, >>>> >>>> I am doing stats::decompose() on 4 different time series. When I issue >>>> >>>> csdA<- decompose(tsA) >>>> plot(csdA) >>>> >>>> I get a summary plot for observed, trend, seasonal and random components >>>> of decomposed time series tsA. As I understand it, the object returned >>>> by decompose() has it's own plot method where mfrow(4,1) etc. is >>>> defined. Now suppose I wanted to wrap those mfrow(4,1) into my own >>>> mfrow(2,2) layout. How could I achieve this? Is there a general way to >>>> handle these cases? Something like a "meta" par(mfrow())? >>> >>> >>> This does not work and is one of the reasons why the grid package was developed. >>> >> >> Does this mean that there is no way whatsoever or that there is a >> workaround via the grid package?? > > See the gridBase package. > Will do. Thanks for the hint From tal.galili at gmail.com Mon May 2 19:02:01 2011 From: tal.galili at gmail.com (Tal Galili) Date: Mon, 2 May 2011 20:02:01 +0300 Subject: [R] Tests for the need of cluster analysis In-Reply-To: References: <1304268859064-3488097.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ckalexa2 at ncsu.edu Mon May 2 18:45:18 2011 From: ckalexa2 at ncsu.edu (Clemontina Alexander) Date: Mon, 2 May 2011 12:45:18 -0400 Subject: [R] Lasso with Categorical Variables Message-ID: Hi! This is my first time posting. I've read the general rules and guidelines, but please bear with me if I make some fatal error in posting. Anyway, I have a continuous response and 29 predictors made up of continuous variables and nominal and ordinal categorical variables. I'd like to do lasso on these, but I get an error. The way I am using "lars" doesn't allow for the factors. Is there a special option or some other method in order to do lasso with cat. variables? Here is and example (considering ordinal variables as just nominal): set.seed(1) Y <- rnorm(10,0,1) X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) X3 <- sample(x=30:55, size=10, replace=TRUE) # think age X4 <- rchisq(10, df=4, ncp=0) X <- data.frame(X1,X2,X3,X4) > str(X) 'data.frame': 10 obs. of 4 variables: $ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 $ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 $ X3: int 51 46 50 44 43 50 30 42 49 48 $ X4: num 2.86 1.55 1.94 2.45 2.75 ... I'd like to do: obj <- lars(x=X, y=Y, type = "lasso") Instead, what I have been doing is converting all data to continuous but I think this is really bad! XX <- data.matrix(X) obj <- lars(x=XX, y=Y, type = "lasso") Thanks for any consideration, Tina From harwood262 at gmail.com Mon May 2 17:38:02 2011 From: harwood262 at gmail.com (Mike Harwood) Date: Mon, 2 May 2011 08:38:02 -0700 (PDT) Subject: [R] ID parameter in model Message-ID: <197e8ae6-a4f8-42c8-9318-8162a3cbae40@gu8g2000vbb.googlegroups.com> Hello, I am apparently confused about the use of an id parameter for an event history/survival model, and why the EHA documentation for aftreg does not specify one. All assistance and insights are appreciated. Attempting to specifiy an id variable with the documentation example generates an "overlapping intervals" error, so I sorted the original mort dataframe and set subsequent entry times an id to the previous exit time + 0.0001. This allowed me to see the affect of the id parameter on the coefficients and significance tests, and prompted my question. The code I used is shown below, with the results at the bottom. Thanks in advance! Mike head(mort) ## data clearly contains multiple entries for some of the dataframe ids no.id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort) ## Inital model id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id) ## overlapping intervals error mort.sort <- ## ensure records ordered mort[ order(mort$id, mort$enter),] ## remove overlap for (i in 2:nrow(mort.sort)){ if (mort.sort[i,'id'] == mort.sort[i-1,'id']) mort.sort[i,'enter'] <- mort.sort[i-1, 'exit'] + 0.0001 } no.id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, data = mort.sort) ## initial model on modified df id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, id=id, data = mort.sort) ## with id parameter #=== output ===========# > no.id.aft.sort Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort) Covariate W.mean Coef Exp(Coef) se(Coef) Wald p ses lower 0.416 0 1 (reference) upper 0.584 -0.347 0.707 0.089 0.000 log(scale) 3.603 36.704 0.065 0.000 log(shape) 0.331 1.393 0.058 0.000 Events 276 Total time at risk 17045 Max. log. likelihood -1391.4 LR test statistic 16.1 Degrees of freedom 1 Overall p-value 6.04394e-05 > id.aft.sort Call: aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort, id = id) Covariate W.mean Coef Exp(Coef) se(Coef) Wald p ses lower 0.416 0 1 (reference) upper 0.584 -0.364 0.695 0.090 0.000 log(scale) 3.588 36.171 0.065 0.000 log(shape) 0.338 1.402 0.058 0.000 Events 276 Total time at risk 17045 Max. log. likelihood -1390.8 LR test statistic 17.2 Degrees of freedom 1 Overall p-value 3.3091e-05 > From mweiss at temple.edu Mon May 2 19:13:12 2011 From: mweiss at temple.edu (MARY A. WEISS) Date: Mon, 2 May 2011 13:13:12 -0400 Subject: [R] Tests for the need of cluster analysis In-Reply-To: References: <1304268859064-3488097.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mailinglist.honeypot at gmail.com Mon May 2 19:51:00 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Mon, 2 May 2011 13:51:00 -0400 Subject: [R] Lasso with Categorical Variables In-Reply-To: References: Message-ID: Hi, On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander wrote: > Hi! This is my first time posting. I've read the general rules and > guidelines, but please bear with me if I make some fatal error in > posting. Anyway, I have a continuous response and 29 predictors made > up of continuous variables and nominal and ordinal categorical > variables. I'd like to do lasso on these, but I get an error. The way > I am using "lars" doesn't allow for the factors. Is there a special > option or some other method in order to do lasso with cat. variables? > > Here is and example (considering ordinal variables as just nominal): > > set.seed(1) > Y <- rnorm(10,0,1) > X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) > X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) > X3 <- sample(x=30:55, size=10, replace=TRUE) ?# think age > X4 <- rchisq(10, df=4, ncp=0) > X <- data.frame(X1,X2,X3,X4) > >> str(X) > 'data.frame': ? 10 obs. of ?4 variables: > ?$ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 > ?$ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 > ?$ X3: int ?51 46 50 44 43 50 30 42 49 48 > ?$ X4: num ?2.86 1.55 1.94 2.45 2.75 ... > > > I'd like to do: > obj <- lars(x=X, y=Y, type = "lasso") > > Instead, what I have been doing is converting all data to continuous > but I think this is really bad! Yeah, it is. Check out the "Categorical Predictor Variables" section here for a way to handle such predictor vars: http://www.psychstat.missouristate.edu/multibook/mlt08m.html HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From dwinsemius at comcast.net Mon May 2 20:47:52 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 2 May 2011 11:47:52 -0700 Subject: [R] Lasso with Categorical Variables In-Reply-To: References: Message-ID: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: > Hi, > > On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander > wrote: >> Hi! This is my first time posting. I've read the general rules and >> guidelines, but please bear with me if I make some fatal error in >> posting. Anyway, I have a continuous response and 29 predictors made >> up of continuous variables and nominal and ordinal categorical >> variables. I'd like to do lasso on these, but I get an error. The way >> I am using "lars" doesn't allow for the factors. Is there a special >> option or some other method in order to do lasso with cat. variables? >> >> Here is and example (considering ordinal variables as just nominal): >> >> set.seed(1) >> Y <- rnorm(10,0,1) >> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) >> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) >> X3 <- sample(x=30:55, size=10, replace=TRUE) # think age >> X4 <- rchisq(10, df=4, ncp=0) >> X <- data.frame(X1,X2,X3,X4) >> >>> str(X) >> 'data.frame': 10 obs. of 4 variables: >> $ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 >> $ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 >> $ X3: int 51 46 50 44 43 50 30 42 49 48 >> $ X4: num 2.86 1.55 1.94 2.45 2.75 ... >> >> >> I'd like to do: >> obj <- lars(x=X, y=Y, type = "lasso") >> >> Instead, what I have been doing is converting all data to continuous >> but I think this is really bad! > > Yeah, it is. > > Check out the "Categorical Predictor Variables" section here for a way > to handle such predictor vars: > http://www.psychstat.missouristate.edu/multibook/mlt08m.html Steve's citation is somewhat helpful, but not sufficient to take the next steps. You can find details regarding the mechanics of typical linear regression in R on the ?lm page where you find that the factor variables are typically handled by model.matrix. See below: > model.matrix(~X1 + X2 + X3 + X4, X) (Intercept) X1B X1C X1D X2F X2G X2H X2I X3 X4 1 1 0 0 1 0 1 0 0 51 2.8640884 2 1 0 0 0 0 0 1 0 46 1.5462243 3 1 0 1 0 0 1 0 0 50 1.9430901 4 1 0 0 0 1 0 0 0 44 2.4504180 5 1 1 0 0 0 0 0 1 43 2.7535052 6 1 1 0 0 0 0 0 1 50 1.6200326 7 1 0 0 0 0 0 0 1 30 0.5750533 8 1 1 0 0 0 0 0 0 42 5.9224777 9 1 0 0 1 0 0 0 1 49 2.0401528 10 1 1 0 0 0 1 0 0 48 6.2995288 attr(,"assign") [1] 0 1 1 1 2 2 2 2 3 4 attr(,"contrasts") attr(,"contrasts")$X1 [1] "contr.treatment" attr(,"contrasts")$X2 [1] "contr.treatment" The numeric variables are passed through, while the dummy variables for factor columns are constructed (as treatment contrasts) and the whole thing it returned in a neat package. -- David. > > HTH, > -steve > -- David Winsemius, MD Heritage Laboratories West Hartford, CT From bbolker at gmail.com Mon May 2 21:51:20 2011 From: bbolker at gmail.com (Ben Bolker) Date: Mon, 2 May 2011 19:51:20 +0000 Subject: [R] Tests for the need of cluster analysis References: <1304268859064-3488097.post@n4.nabble.com> Message-ID: MARY A. WEISS temple.edu> writes: > > Hi, > > I am currently using STATA in my analysis. STATA has a cluster option but > does not have any tests for whether cluster analysis is necessary or not for > a dataset. So I am trying to figure out whether R could be used to test > whether I need to be doing cluster analysis or not. If R does tests to > determine whether cluster analysis is valid for my data, I will learn R and > use it on my data. > > My data are panel data consisting of 49 states and 25 years. Currently, I > am estimating models with fixed state and time effects. > > Thanks for any help you can give me. > > Cheers, > > Mary You might want to forward this question to the r-sig-mixed-models list. I think you are fairly far off base in comparing 'prabclus' (spatial clustering) to what Stata means by "clustered standard errors" (e.g. ). Cluster _analysis_ has to do with finding clusters in data; prabclus uses spatial information to do cluster analysis; robust cluster variances or standard errors have to do with adjusting variance/SE to account for predetermined grouping variables ("clusters" in the data, e.g. states). I don't know offhand whether there are packages in R that implement the "robust cluster variance" estimator; packages like geeglm, geepack, and especially the "sandwich" package are definitely worth looking at (they implement the equivalent of robust, but not robust cluster [as far as I can tell], variance estimators]), as well as the Econometrics Task View and the book "R for Stata Users" by Muenchen and Hilbe. A final philosophical note: I don't think you should be testing _based on your data_ whether robust or robust cluster variance estimators are more appropriate; there's a fairly dangerous data snooping issue here. Rather, you should try to decide _a priori_ based on your data what's most appropriate. Ben Bolker > > On Mon, May 2, 2011 at 1:02 PM, Tal Galili gmail.com> wrote: > > > Hi Mary, > > Are you using R for your other analysis? > > If so, What commands are you using for your analysis? > > > > p.s: please keep the rest of the R-help mailing list in the loop. > > > > Cheers, > > Tal > > > > > > [snip] > > > > > > > > > [snip] MARY A. WEISS temple.edu> wrote: > > > >> Hi Tal, > >> > >> Thanks for your answer. I am running models with two-way fixed effects > >> and two-way fixed effects with a cluster option. The results are very > >> different. I wanted to know if it is appropriate to cluster my data or > >> not. In looking through the R manual, > >> I thought that prabclus might help me > >> answer the question. Does prabclus include any tests that will tell me if > >> cluster analysis is appropriate to use with my data? That is, is cluster > >> analysis valid for my data? > >> > >> Thanks in advance for any help you can give me. I really appreciate it. > >> > >> Mary > >> [snip] > >> > >>> Hi Mary, > >>> I'm not sure I understood your question. > >>> > >>> Are you using this package: > >>> http://cran.r-project.org/web/packages/prabclus/index.html > >>> And asking > >>> how to decide if to use it or not? > >>> > >>> ----------------Contact > >>> Details:------------------------------------------------------- > >>> Contact me: Tal.Galili gmail.com | 972-52-7275845 > >>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > >>> www.r-statistics.com (English) > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Sun, May 1, 2011 at 7:54 PM, mary weiss temple.edu> wrote: > >>> > >>>> Does R have the capability to perform tests for the need of clustering > >>>> analysis (e.g., in prabclus)? I am using panel data with two-way fixed > >>>> effects but am unsure about whether I should be using a cluster option > >>>> as > >>>> well to estimate my model.-- > >>>> [snip] From claire.bild at supelec.fr Mon May 2 21:41:19 2011 From: claire.bild at supelec.fr (petrolmaniac) Date: Mon, 2 May 2011 12:41:19 -0700 (PDT) Subject: [R] Optimization - n dimension matrix Message-ID: <1304365279295-3490772.post@n4.nabble.com> Dear all, I am facing the following problem in optimization: w = (d, o1, ..., op, m1, ..., mq) is a 1 + p + q vector I want to determine: w = argmin (a - d(w))' A (a - d(w)) where a is a 1xK marix, A is the covariance matrix of vector a, d(w) is a 1xK vector which parameters are functions of parameters d, o1 .. op, m1 .. mq. Is there some function to solve this problem easily? I know optim() and ucminf() for one-dimensional optimization (I believe). Are there some tools for such n-dimensional problem? Kind regards, C. -- -- View this message in context: http://r.789695.n4.nabble.com/Optimization-n-dimension-matrix-tp3490772p3490772.html Sent from the R help mailing list archive at Nabble.com. From rcassidy at alcor.concordia.ca Mon May 2 21:15:10 2011 From: rcassidy at alcor.concordia.ca (Robert Cassidy) Date: Mon, 2 May 2011 15:15:10 -0400 Subject: [R] Help converting a data.frame to ordered factors Message-ID: I have a 96x34 array of Likert scale data (96 cases, 34 items) of ordered factors (strongly disagree, disagree, neutral, agree, strongly agree) that are coded numerically (1 through 5). I cannot seem to convert this array (in any class) into ordered vectors. I have all the cases as vectors of ordered factors, but any which way I reassemble those vectors loses the ordered factors and converts back to numbers. Can someone tell me how to either convert the data.frame into ordered factors OR how to assemble the vectors (of ordered factors) into an array that preserves the factors. Many thanks in advance for any help. Robert -- Robert Cassidy, PhD Department of Psychology Concordia University 7141 Sherbrooke W. Montreal (QC) H4B 1R6 tel: (514) 848-2424 x2244 fax: (514) 848-4523 office: PY-119.2 From pdavison14 at gmail.com Mon May 2 20:26:16 2011 From: pdavison14 at gmail.com (Paul Davison) Date: Mon, 2 May 2011 14:26:16 -0400 Subject: [R] Help with coloring segments on a plot Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mi2kelgrum at yahoo.com Mon May 2 22:03:38 2011 From: mi2kelgrum at yahoo.com (Mikkel Grum) Date: Mon, 2 May 2011 13:03:38 -0700 (PDT) Subject: [R] INSERT OR UPDATE Message-ID: <371831.48337.qm@web65709.mail.ac4.yahoo.com> I'm trying to insert rows of a data.frame into a database table, or update where the key fields of a record already exist in the table. I've come up with a possible solution below, but would like to hear if anyone has a better solution. # The problem demonstrated: # Create a data.frame with test values library(RODBC) tbl <- data.frame( key1 = rep(1:3, each = 2), key2 = rep(LETTERS[1:2], 3), somevalue = rnorm(6) ) # Create table in database using the following SQL CREATE TABLE tbl ( key1 integer NOT NULL, key2 character varying(1) NOT NULL, somevalue double precision, CONSTRAINT pktbl PRIMARY KEY (key1, key2) ) # Continue in R pg <- odbcConnect("testdb") sqlSave(pg, tbl[1:2, ], append = TRUE, rownames = FALSE) sqlSave(pg, tbl[3, ], append = TRUE, rownames = FALSE) tbl[1, 3] <- 1 sqlUpdate(pg, tbl[1:4, ], index = c("key1", "key2")) # Fails # Can replace the above sqlUpdate with: sqlUpdate(pg, tbl[1:3, ], index = c("key1", "key2")) sqlSave(pg, tbl[4, ], append = TRUE, rownames = FALSE) # Proposed solution: tbl[1, 3] <- 0 tmp <- tbl yes <- sqlQuery(pg, "SELECT key1, key2 FROM tabl", as.is = TRUE) for (i in seq(along = present$key1)) { sqlUpdate(pg, tmp[tmp$key1 == yes$key1[i] & tmp$key2 == yes$key2[i], ], "tbl", index = c("key1", "key2")) tmp <- tmp[!(tmp$key1 == yes$key1[i] & tmp$key2 == yes$key2[i]), ] } sqlSave(pg, tmp, "tbl", append = TRUE, rownames = FALSE) This is fine for small tables, where the need for updates is frequent, and there is no risk of anyone else doing the same thing at the same time. If the table is big and updates are rare, it seems like quite an overhead for what would essential be inserts. Does anyone have a more rational way of doing this with big data sets where updates are rare, e.g. only do it if sqlSave fails? Is it possible to put a lock on the database while doing the updates and inserts to avoid problems with concurrency? I'm working with PostgreSQL, but the example should be generic. Thanks in advance Mikkel From hill0093 at umn.edu Mon May 2 22:42:04 2011 From: hill0093 at umn.edu (Hurr) Date: Mon, 2 May 2011 13:42:04 -0700 (PDT) Subject: [R] Copying to R a rectangular array from a Java class In-Reply-To: <1304180854651-3486167.post@n4.nabble.com> References: <1304180854651-3486167.post@n4.nabble.com> Message-ID: <1304368924113-3490919.post@n4.nabble.com> I am happy to report that the author and maintainer of rJava informed me that the 2-dim array in java needs sapply and .jevalArray as follows: > conn2Arr <- sapply(.jfield(rJavaTst,sig="[[D","con2Arr"),.jevalArray) > conn2ArrRet <- > sapply(.jcall(rJavaTst,returnSig="[[D","retCon2Arr"),.jevalArray) > # I can't identify any complaints so far > print(conn2Arr) [,1] [,2] [,3] [1,] 101 201 301 [2,] 102 202 302 [3,] 103 203 303 [4,] 104 204 304 > print(conn2ArrRet) [,1] [,2] [,3] [1,] 101 201 301 [2,] 102 202 302 [3,] 103 203 303 [4,] 104 204 304 I know there are a few out there interested since I see I got a few views. But, I don't know the solution to the parameter-passing problem yet. -- View this message in context: http://r.789695.n4.nabble.com/Copying-to-R-a-rectangular-array-from-a-Java-class-tp3486167p3490919.html Sent from the R help mailing list archive at Nabble.com. From Greg.Snow at imail.org Mon May 2 22:50:39 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Mon, 2 May 2011 14:50:39 -0600 Subject: [R] QQ plot for normality testing In-Reply-To: References: Message-ID: I would use the vis.test function along with vt.qqnorm (both in TeachingDemos package). This will create several plots, one of which is your data, the rest are simulated normals with the same mean and standard deviation. If you can tell which plot stands out (and it is your real data) then that suggests that the data is not normal. If you cannot tell which plot is the real data then that suggests that your data is close enough to normal. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Matev? Pavlic > Sent: Saturday, April 30, 2011 11:28 AM > To: r-help at r-project.org > Subject: [R] QQ plot for normality testing > > Hi all, > > > > I am trying to test wheater the distribution of my samples is normal > with QQ plot. > > > > I have a values of water content in clays in around few hundred > samples. Is the code : > > > > qqnorm(w) #w being water content > > qqline(w) > > > > > > sufficient? > > > > How do I know when I get the plots which distribution is normal and > which is not? > > > > Thanks, m > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From mbmiller+l at gmail.com Mon May 2 22:53:58 2011 From: mbmiller+l at gmail.com (Mike Miller) Date: Mon, 2 May 2011 15:53:58 -0500 Subject: [R] UNIX-like "cut" command in R Message-ID: The R "cut" command is entirely different from the UNIX "cut" command. The latter retains selected fields in a line of text. I can do that kind of manipulation using sub() or gsub(), but it is tedious. I assume there is an R function that will do this, but I don't know its name. Can you tell me? I'm also guessing that there is a web page somewhere that will tell me how to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of things in R. By that I mean by using R functions, not by making system calls. Does anyone know of such a web page? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota From spector at stat.berkeley.edu Tue May 3 00:00:12 2011 From: spector at stat.berkeley.edu (Phil Spector) Date: Mon, 2 May 2011 15:00:12 -0700 (PDT) Subject: [R] Help converting a data.frame to ordered factors In-Reply-To: References: Message-ID: Robert - It would be helpful to know what you've tried that didn't work, but the data.frame() function is the usual way of combining things like this: > a = factor(sample(1:5,100,replace=TRUE),ordered=TRUE) > b = factor(sample(1:5,100,replace=TRUE),ordered=TRUE) > ab = data.frame(a,b) > sapply(ab,class) a b [1,] "ordered" "ordered" [2,] "factor" "factor" In particular cbind() and matrix() will not work properly for what you're trying to do. Of course, if you explained exactly how you're creating the 96x34 array, there might be a better solution. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Mon, 2 May 2011, Robert Cassidy wrote: > I have a 96x34 array of Likert scale data (96 cases, 34 items) of > ordered factors (strongly disagree, disagree, neutral, agree, strongly > agree) that are coded numerically (1 through 5). > > I cannot seem to convert this array (in any class) into ordered vectors. > > I have all the cases as vectors of ordered factors, but any which way > I reassemble those vectors loses the ordered factors and converts back > to numbers. > > Can someone tell me how to either convert the data.frame into ordered > factors OR how to assemble the vectors (of ordered factors) into an > array that preserves the factors. > > Many thanks in advance for any help. > Robert > > > -- > Robert Cassidy, PhD > Department of Psychology > Concordia University > 7141 Sherbrooke W. > Montreal (QC) H4B 1R6 > tel: (514) 848-2424 x2244 > fax: (514) 848-4523 > office: PY-119.2 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From stevenkennedy2263 at gmail.com Tue May 3 00:15:32 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Tue, 3 May 2011 08:15:32 +1000 Subject: [R] INSERT OR UPDATE In-Reply-To: <371831.48337.qm@web65709.mail.ac4.yahoo.com> References: <371831.48337.qm@web65709.mail.ac4.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From luke-tierney at uiowa.edu Tue May 3 00:34:30 2011 From: luke-tierney at uiowa.edu (luke-tierney at uiowa.edu) Date: Mon, 2 May 2011 17:34:30 -0500 Subject: [R] setting options only inside functions In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D700042B90B4@NA-PA-VBE03.na.tibco.com> References: <98152.75771.qm@web28208.mail.ukl.yahoo.com><77EB52C6DD32BA4D87471DCD70C8D700042B8BB3@NA-PA-VBE03.na.tibco.com> <77EB52C6DD32BA4D87471DCD70C8D700042B90B4@NA-PA-VBE03.na.tibco.com> Message-ID: On Fri, 29 Apr 2011, William Dunlap wrote: >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of >> luke-tierney at uiowa.edu >> Sent: Friday, April 29, 2011 9:35 AM >> To: Jonathan Daily >> Cc: r-help at r-project.org; Hadley Wickham; Barry Rowlingson >> Subject: Re: [R] setting options only inside functions >> >> The Python solution does not extend, at least not cleanly, to things >> like dev on/ dev off or to Hadley's locale example. In any case if I >> am reading the Python source correctly on how they handle user >> interrupts this solution has the same non-robusness to user interrupts >> issue that Bill's initial solution had. >> >> As a basis I believe what we need is a mechanism that handles a >> setup, an action, and a cleanup, with setup and cleanup occurring with >> interrupts disablednand the action with interrupts enabled. Scheme's >> dynamic wind is similar, though I don't believe the scheme standard >> addresses interrupts and we don't need to worry about continuations, >> but some of the issues are similar. Probably we would want two >> flavors, one in which the action has to be a function that takes as a >> single argument the result produced by the setup code, and one in >> which the action can be an argument expression that is then evaluated >> at the appropriate place by laze evaluation. >> >> This can be done at the R level except for the controlling of >> interrupts (and possibly other asynchronous stuff)-- that would need a >> new pair of primitives (suspendInterrupts/enableInterupts or something >> like that). There is something in the Haskell literature on this that >> I have looked at a while back -- probably time to have another look. > > Luke, > > A similar problem is that if optionsList contains an illegal > option then setting options(optionList) will commit changes > to .Options as it works it way down the optionList until it > hits the illegal option, when it throws an error. Then the > following on.exit is never called (it wouldn't have the output > of options(optionList) to work on if it were called) and the > initial settings in optionList stick around forever. E.g., > > > withOptions <- function(optionList, expr) { > + oldOpt <- options(optionList) > + on.exit(options(oldOpt)) > + expr > + } > > getOption("height") > NULL > > getOption("width") > [1] 80 > > withOptions(list(height=10, width=-2), 666) > Error in options(optionList) : > invalid 'width' parameter, allowed 10...10000 > > getOption("height") > [1] 10 > > getOption("width") > [1] 80 > > I haven't checked to see if par() works in the same way - it > does in S+. > > An ignoreInterrupts(expr) function would not help in that case. It would be solving an orthogonal problem. > Making options() (and par()) atomic operations would help, but that > may be a lot of work. But it would be the right thing to do for this purpose, either by creating an atomic version just for use in this context or by having a withOptions construct recursively work thougheach option. > options() might also warn but no change > .Options if there were an attempt to set an illegal option. Seems more or less the same as making options() atomic. Best, luke > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> >> >> On Thu, 28 Apr 2011, Jonathan Daily wrote: >> >>> I would also love to see this implemented in R, as my >> current solution >>> to the issue of doing tons of open/close, dev/dev.off, etc. >> is to use >>> snippets in my IDE, and in the end I feel like it is a hack job. A >>> pythonic "with" function would also solve most of the >> situations where >>> I have had to use awkward try or tryCatch calls. I would be >> willing to >>> help with this project, even if it is just testing. >>> >>> On Wed, Apr 27, 2011 at 5:43 PM, Barry Rowlingson >>> wrote: >>>>> but it's a little clumsy, because >>>>> >>>>> with_connection(file("myfile.txt"), {do stuff...}) >>>>> >>>>> isn't very useful because you have no way to reference >> the connection >>>>> that you're using. Ruby's blocks have arguments which >> would require >>>>> big changes to R's syntax. ?One option would to use pronouns: >>>> >>>> ?Looking very much like python 'with' statements: >>>> >>>> http://effbot.org/zone/python-with-statement.htm >>>> >>>> ?Implemented via the 'with' statement which can operate on anything >>>> that has a __enter__ and an __exit__ method. Very neat. >>>> >>>> Barry >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> >> >> -- >> Luke Tierney >> Statistics and Actuarial Science >> Ralph E. Wareham Professor of Mathematical Sciences >> University of Iowa Phone: 319-335-3386 >> Department of Statistics and Fax: 319-335-3017 >> Actuarial Science >> 241 Schaeffer Hall email: luke at stat.uiowa.edu >> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu >> > -- Luke Tierney Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke at stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu From A.Robinson at ms.unimelb.edu.au Tue May 3 00:35:23 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Tue, 3 May 2011 08:35:23 +1000 Subject: [R] UNIX-like "cut" command in R In-Reply-To: References: Message-ID: <20110502223523.GJ48756@ms.unimelb.edu.au> Hi Mike, try substr() Cheers Andrew On Mon, May 02, 2011 at 03:53:58PM -0500, Mike Miller wrote: > The R "cut" command is entirely different from the UNIX "cut" command. > The latter retains selected fields in a line of text. I can do that kind > of manipulation using sub() or gsub(), but it is tedious. I assume there > is an R function that will do this, but I don't know its name. Can you > tell me? > > I'm also guessing that there is a web page somewhere that will tell me how > to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of > things in R. By that I mean by using R functions, not by making system > calls. Does anyone know of such a web page? > > Thanks in advance. > > Mike > > -- > Michael B. Miller, Ph.D. > Minnesota Center for Twin and Family Research > Department of Psychology > University of Minnesota > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Tue May 3 00:41:54 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Tue, 3 May 2011 08:41:54 +1000 Subject: [R] Optimization - n dimension matrix In-Reply-To: <1304365279295-3490772.post@n4.nabble.com> References: <1304365279295-3490772.post@n4.nabble.com> Message-ID: <20110502224154.GK48756@ms.unimelb.edu.au> Hello, optim() works for more than one dimension. You might also find this page helpful: http://cran.r-project.org/web/views/Optimization.html Cheers Andrew On Mon, May 02, 2011 at 12:41:19PM -0700, petrolmaniac wrote: > Dear all, > > I am facing the following problem in optimization: > > w = (d, o1, ..., op, m1, ..., mq) is a 1 + p + q vector > > I want to determine: > > w = argmin (a - d(w))' A (a - d(w)) > > where a is a 1xK marix, A is the covariance matrix of vector a, d(w) is a > 1xK vector which parameters are functions of parameters d, o1 .. op, m1 .. > mq. > > Is there some function to solve this problem easily? I know optim() and > ucminf() for one-dimensional optimization (I believe). Are there some tools > for such n-dimensional problem? > > Kind regards, > > C. > -- > > -- > View this message in context: http://r.789695.n4.nabble.com/Optimization-n-dimension-matrix-tp3490772p3490772.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From mi2kelgrum at yahoo.com Tue May 3 00:45:34 2011 From: mi2kelgrum at yahoo.com (Mikkel Grum) Date: Mon, 2 May 2011 15:45:34 -0700 (PDT) Subject: [R] INSERT OR UPDATE Message-ID: <482778.60381.qm@web65707.mail.ac4.yahoo.com> Thanks Steven. It obviously makes sense to loop on the much smaller dataset that is being added than the set of everything that might already be in the database. I've added your message in plain text, so that others can see it too. Mikkel From: Steven Kennedy Subject: Re: [R] INSERT OR UPDATE To: "Mikkel Grum" Cc: "R Help" Date: Monday, May 2, 2011, 5:15 PM Rather than selecting all the keys, then having R loop through them, why not have postgres do it for you with something like: ? #go through each line in our entry table for (i in 1:dim(tbl)[1]){ ??? #check if the pkey already exists ??? q <- paste ("SELECT key1, key2 FROM tabl WHERE key1=",tbl[i,1]," ??????? AND key2=",tbl[i,1]",sep="")? ??? yes <- sqlQuery(pg, q, as.is = TRUE) ??? if (dim(yes)[1] == 1){ ??????? #update the row if it exists ??????? sqlUpdate(pg, tbl[i,],"tbl", index = c("key1", "key2")) ??? } else { ??????? #add the row if it doesn't ??????? sqlSave(pg, tbl[i,], "tbl", append = TRUE, rownames = FALSE) ??? } } ? This should work fine for small or large tables (especially if you index the large table that doesn't change much). ? From carl at witthoft.com Tue May 3 01:14:00 2011 From: carl at witthoft.com (Carl Witthoft) Date: Mon, 02 May 2011 19:14:00 -0400 Subject: [R] easy way to do a 2-D fit to an array of data? Message-ID: <4DBF3AB8.3050005@witthoft.com> Hi, I've got a matrix, Z, of values representing (as it happens) optical power at each pixel location. Since I know in advance I've got a single, convex peak, I would like to do a 2D parabolic fit of the form Z = poly((x+y),2) where x and y are the x,y coordinates of each pixel (or equivalently, the row, column numbers). Is there an R function that lets me easily implement that? I've started down the path of something like zvec <- as.vector(Z), and creating applicable x,y vectors by something like (where for the sake of argument Z is 128x128) foo<-matrix(seq(1,128),128,128) xvec <- as.vector(foo) yvec <- as.vector(t(foo)) at which point I can feed zvec, xvec, yvec to lm() . I'm hopeful someone can point me to a much easier way to do the same thing. Oh, and if there's a 2-D splinefunction generator, that would work for me as well. thanks Carl From A.Robinson at ms.unimelb.edu.au Tue May 3 01:18:45 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Tue, 3 May 2011 09:18:45 +1000 Subject: [R] Help with coloring segments on a plot In-Reply-To: References: Message-ID: <20110502231845.GL48756@ms.unimelb.edu.au> Hi Paul, not to seem naive, but have you actually tried the code below? It doesn't seem that you have, from your text. I think that if you try it and hack then ask concrete questions (e.g. can anyone explain why the following simple, reproducible, commented code does not work) then you'll have more luck. Best wishes Andrew On Mon, May 02, 2011 at 02:26:16PM -0400, Paul Davison wrote: > Hi. I need a very short piece of help regarding colouring segments plotted > on a graph. > > When I am plotting segments for the graph, I am using "red" and "darkgreen > for the values "1" and "2" respectively. Heres the relevant line of code in > R: > > + col = c("red", "darkgreen")[line.colour.value]) > > I just need to extend this to refer to a larger range of numbers from 1 to > 10, to plot the segments in ten different colours. The values are just the > first ten integers: 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 > Each of the ten values will refer to a different colour just as "1" would > plot a segment in red and "2" would plot a segment in darkgreen. > > The only other condition I need is that the colours be in hex format. Would > this be along the right lines? : > > + col = c("#FFFFFF", "#FFFFFF", "#FFFFFF", "#FFFFFF", "#FFFFFF", "#FFFFFF", > "#FFFFFF", "#FFFFFF", "#FFFFFF", "#FFFFFF",)[line.colour.value]) > > Or would I need to adjust the code in other places too? > > I have copied the code I am using below. I have also copied below a small > excerpt of the simple data I am plotting - with the headers at the top. > > Thank you so much for your help. > > Paul Davison > University of Cambridge, UK > > > > > > data = read.csv("r.test.data.csv", header = TRUE) > > with(data, { > + par(bg="#0B5FA5") > + par(lwd=0.01) > + plot(NA, NA, > + xlim = range(start.x.co.ordinate, end.x.co.ordinate, 50000), > + ylim = range(start.y.co.ordinate, end.y.co.ordinate, 50000), > + type = "n", ann = FALSE, axes = FALSE) > + segments(start.x.co.ordinate, start.y.co.ordinate, > + end.x.co.ordinate, end.y.co.ordinate, > + col = c("red", "darkgreen")[line.colour.value]) > + title(main = "10th April 1991", > + xlab = "Pandora", > + ylab = "Luna") > + }) > >> quartz.save("sample4.png","png") > > > The values in the following data table for the column "line.colour.value" > are just 1s and 2s. Ideally I would have numbers of 1 through to 10 and each > one would plot a different coloured (using a hex value) segment. > > > start.x.co.ordinate start.y.co.ordinate end.x.co.ordinate > end.y.co.ordinate line.colour.value > 300 300 2289 20289 2 300 300 2692 20467 1 300 300 3010 20608 2 300 300 > 2727 19828 1 300 300 2606 20056 2 300 300 16244 21416 1 300 300 16154 21899 > 2 300 300 16941 21434 1 300 300 17356 20205 2 300 300 16928 21245 1 300 300 > 16011 21024 2 300 300 17323 20053 1 300 300 17312 20435 2 300 300 17175 > 21259 1 300 300 16851 21268 2 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From ckalexa2 at ncsu.edu Mon May 2 23:22:57 2011 From: ckalexa2 at ncsu.edu (Clemontina Alexander) Date: Mon, 2 May 2011 17:22:57 -0400 Subject: [R] Lasso with Categorical Variables In-Reply-To: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> References: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> Message-ID: Thanks for your response, but I guess I didn't make my question clear. I am already familiar with the concept of dummy variables and regression in R. My question is, can the "lars" package (or some other lasso algorithm) handle factors? I did use dummy variables in my original data, but lars (lasso) only shrank the coefficients of some of the levels of one factor to 0. Is this the correct thing to do? Because intuitively it seems like I would want to shrink the whole factor coefficient to 0. If this is correct, what is the interpretation? For example, for X1, if lasso drops the coefficient for levels A and B, but not C and D, does this mean that X1 should be included in the model? Thanks. On Mon, May 2, 2011 at 2:47 PM, David Winsemius wrote: > > On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: > >> Hi, >> >> On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander >> wrote: >>> >>> Hi! This is my first time posting. I've read the general rules and >>> guidelines, but please bear with me if I make some fatal error in >>> posting. Anyway, I have a continuous response and 29 predictors made >>> up of continuous variables and nominal and ordinal categorical >>> variables. I'd like to do lasso on these, but I get an error. The way >>> I am using "lars" doesn't allow for the factors. Is there a special >>> option or some other method in order to do lasso with cat. variables? >>> >>> Here is and example (considering ordinal variables as just nominal): >>> >>> set.seed(1) >>> Y <- rnorm(10,0,1) >>> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) >>> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) >>> X3 <- sample(x=30:55, size=10, replace=TRUE) ?# think age >>> X4 <- rchisq(10, df=4, ncp=0) >>> X <- data.frame(X1,X2,X3,X4) >>> >>>> str(X) >>> >>> 'data.frame': ? 10 obs. of ?4 variables: >>> ?$ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 >>> ?$ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 >>> ?$ X3: int ?51 46 50 44 43 50 30 42 49 48 >>> ?$ X4: num ?2.86 1.55 1.94 2.45 2.75 ... >>> >>> >>> I'd like to do: >>> obj <- lars(x=X, y=Y, type = "lasso") >>> >>> Instead, what I have been doing is converting all data to continuous >>> but I think this is really bad! >> >> Yeah, it is. >> >> Check out the "Categorical Predictor Variables" section here for a way >> to handle such predictor vars: >> http://www.psychstat.missouristate.edu/multibook/mlt08m.html > > Steve's citation is somewhat helpful, but not sufficient to take the next > steps. You can find details regarding the mechanics of typical linear > regression in R on the ?lm page where you find that the factor variables are > typically handled by model.matrix. See below: > >> model.matrix(~X1 + X2 + X3 + X4, X) > ? (Intercept) X1B X1C X1D X2F X2G X2H X2I X3 ? ? ? ?X4 > 1 ? ? ? ? ? ?1 ? 0 ? 0 ? 1 ? 0 ? 1 ? 0 ? 0 51 2.8640884 > 2 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 ? 0 46 1.5462243 > 3 ? ? ? ? ? ?1 ? 0 ? 1 ? 0 ? 0 ? 1 ? 0 ? 0 50 1.9430901 > 4 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 1 ? 0 ? 0 ? 0 44 2.4504180 > 5 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 43 2.7535052 > 6 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 50 1.6200326 > 7 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 30 0.5750533 > 8 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 42 5.9224777 > 9 ? ? ? ? ? ?1 ? 0 ? 0 ? 1 ? 0 ? 0 ? 0 ? 1 49 2.0401528 > 10 ? ? ? ? ? 1 ? 1 ? 0 ? 0 ? 0 ? 1 ? 0 ? 0 48 6.2995288 > attr(,"assign") > ?[1] 0 1 1 1 2 2 2 2 3 4 > attr(,"contrasts") > attr(,"contrasts")$X1 > [1] "contr.treatment" > > attr(,"contrasts")$X2 > [1] "contr.treatment" > > The numeric variables are passed through, while the dummy variables for > factor columns are constructed (as treatment contrasts) and the whole thing > it returned in a neat package. > > -- > David. >> >> HTH, >> -steve >> > -- > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > From A.Robinson at ms.unimelb.edu.au Tue May 3 01:59:43 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Tue, 3 May 2011 09:59:43 +1000 Subject: [R] Simulation Questions In-Reply-To: References: Message-ID: <20110502235943.GM48756@ms.unimelb.edu.au> Hi Shane, it sounds to me as though you have a fairly well-defined problem. You want to generate random numbers with a specific mean, variance, and correlation with another random varaible. I would reverse-enginerr the fuinctions for simple linear regression to get a result like y = beta_0 + beta_1 * x + rnorm(n, 0, sigma^2) and use that as the basis of generating random numbers. Not sure how to interpret the second question ... Cheers Andrew On Sun, May 01, 2011 at 12:33:41AM -0400, Shane Phillips wrote: > I have the following script for generating a dataset. It works like a champ except for a couple of things. > > 1. I need the variables "itbs" and "map" to be negatively correlated with the binomial variable "lunch" (around -0.21 and -0.24, respectively). The binomial variable "lunch" needs to remain unchanged. > 2. While my generated variables do come out with the desired means and correlations, the distribution is very narrow and only represents a small portion of the possible scores. Can I force it to encompass a wider range of scores, while maintaining my desired parameters and correlations? > > Please help... > > Shane > > Script follows... > > > > #Number the subjects > subject=1:1000 > #Assign a treatment condition from a binomial distribution with a probability of 0.13 > treat=rbinom(1*1000,1,.13) > #Assign a lunch status condition froma binomial distribution with a probability of 0.35 > lunch=rbinom(1*1000,1,.35) > #Generate age in months from a random normal distribution with mean of 87 and sd of 2 > age=rnorm(1000,87,2) > #invoke the MASS package > require(MASS) > #Establish the covariance matrix for MAP, ITBS and CogAT scores > sigma <- matrix(c(1, 0.84, 0.59, 0.84, 1, 0.56, 0.59, 0.56, 1), ncol = 3) > #Establish MAP as a random normal variable with mean of 200 and sd of 9 > map <- rnorm(1000, 200, 9) > #Establish ITBS as a random normal variable with mean of 175 and sd of 15 > itbs <- rnorm(1000, 175, 15) > #Establish CogAT as a random normal variable with mean of 100 and sd of 16 > cogat<-rnorm(1000,100,16) > #Create a dataframe of MAP, ITBS, and CogAT > data <- data.frame(map, itbs, cogat) > #Draw from the multivariate distribution defined by MAP, ITBS, and CogAT means and the covariance matrix > sim <- mvrnorm(1000, mu=mean(data), sigma, empirical=FALSE) > #Set growth at 0 > growth=0 > #Combine elements into a single dataset > simtest=data.frame (subject=subject, treat=treat,lunch, age=round(age,0),round(sim,0),growth) > #Set mean growth by treatment condition with treatd subjects having a mean growth of 1.5 and non-treated having a mean growth of 0.1 > simtest<-transform(simtest, growth=rnorm(1000,m=ifelse(treat==0,0.1,1.5),s=1)) > simtest > cor (simtest) > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From richcmwang at gmail.com Mon May 2 23:48:49 2011 From: richcmwang at gmail.com (Richard Wang) Date: Mon, 2 May 2011 22:48:49 +0100 Subject: [R] install rdcomclient source Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From junchuanzeng at gmail.com Tue May 3 01:26:22 2011 From: junchuanzeng at gmail.com (jessezeng) Date: Mon, 2 May 2011 16:26:22 -0700 (PDT) Subject: [R] Impulse response analysis within package vars In-Reply-To: References: <489ADED6.7000603@boeser.ch> Message-ID: <1304378782226-3491284.post@n4.nabble.com> Hi, I have a similar question: ir <- irf(varsumm, impulse=c("prod", "rea", "rpo") n.ahead=20, runs=500, ci=0.95) will calculate the orthogonalized impulse responses from "prod", "rea", and "rpo", i.e. a (1, 1, 1)' vector. What do I need to do to make the impulse (-1, 1, 1)', i.e. I want the the first shock to be negative 1 unit? Thanks, Jesse -- View this message in context: http://r.789695.n4.nabble.com/Impulse-response-analysis-within-package-vars-tp841596p3491284.html Sent from the R help mailing list archive at Nabble.com. From mikesmith00 at yahoo.com Tue May 3 01:42:33 2011 From: mikesmith00 at yahoo.com (Mike Smith) Date: Mon, 2 May 2011 16:42:33 -0700 (PDT) Subject: [R] List of Data Frames Message-ID: <723857.31535.qm@web35408.mail.mud.yahoo.com> I'm trying to create a list of Data Frames. ?I have 17 data frames that I need to move through in a loop, but if I simply make a list of them, then they do not stay data frames, and I can't sort through them. ?I tried to create an array, but the data frames can have anywhere from 14-16 rows, and I couldn't find a way to make a variable size array. ?If you have any ideas, I would greatly appreciate any help, as I'm trying to learn R, and decided to apply it to a project that I have been working on. ?My goal is splitting a sports season into games per week, and then do statistics on each week, but have an average running up to that point in the season. ?Thus the list would be indexed by weeks, and then there's a data frame of the game and all?relevant?statistics.? Thank You, Mike Smith From jerome.asselin.stat at gmail.com Tue May 3 02:56:16 2011 From: jerome.asselin.stat at gmail.com (Jerome Asselin) Date: Mon, 02 May 2011 20:56:16 -0400 Subject: [R] List of Data Frames In-Reply-To: <723857.31535.qm@web35408.mail.mud.yahoo.com> References: <723857.31535.qm@web35408.mail.mud.yahoo.com> Message-ID: <1304384176.9022.30.camel@localhost> On Mon, 2011-05-02 at 16:42 -0700, Mike Smith wrote: > I'm trying to create a list of Data Frames. I have 17 data frames > that I need to move through in a loop, but if I simply make a list of > them, then they do not stay data frames, and I can't sort through > them. I tried to create an array, but the data frames can have > anywhere from 14-16 rows, and I couldn't find a way to make a variable > size array. If you have any ideas, I would greatly appreciate any > help, as I'm trying to learn R, and decided to apply it to a project > that I have been working on. My goal is splitting a sports season > into games per week, and then do statistics on each week, but have an > average running up to that point in the season. Thus the list would > be indexed by weeks, and then there's a data frame of the game and > all relevant statistics. My understanding is that you want to have one data frame per week. I question whether it is necessary to split these data frames. I would design a single data frame with one column to identify the week and then work on subsets of that data frame to calculate the weekly stats or cumulative indexes as you wish. Jerome From rolf.turner at xtra.co.nz Tue May 3 03:00:44 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Tue, 03 May 2011 13:00:44 +1200 Subject: [R] List of Data Frames In-Reply-To: <723857.31535.qm@web35408.mail.mud.yahoo.com> References: <723857.31535.qm@web35408.mail.mud.yahoo.com> Message-ID: <4DBF53BC.40608@xtra.co.nz> On 03/05/11 11:42, Mike Smith wrote: > I'm trying to create a list of Data Frames. I have 17 data frames that I need to move through in a loop, but if I simply make a list of them, then they do not stay data frames, That is simply not true. Just ***how*** did you ``make a list of them''??? > and I can't sort through them. I tried to create an array, but the data frames can have anywhere from 14-16 rows, and I couldn't find a way to make a variable size array. If you have any ideas, I would greatly appreciate any help, as I'm trying to learn R, and decided to apply it to a project that I have been working on. My goal is splitting a sports season into games per week, and then do statistics on each week, but have an average running up to that point in the season. Thus the list would be indexed by weeks, and then there's a data frame of the game and all relevant statistics. You can make a list of data frames with syntax something like L <- list(DF1, DF2,etc.) where DF1, ... DF17 are your data frames. If your data frames all have the same number of columns (and the column names are the same) you might want to rbind() them together into a single data frame. If the data frames correspond to ``week'' then you might want to add a ``week'' column to each data frame before doing the rbind(); the value of week would be constant over the rows of each of your original data frames, but would of course vary over rows in your ``big'' data frame (the object returned by rbind). It's very hard to recommend anything specific, since your question was so vague. I suggest that you read the Posting Guide. cheers, Rolf Turner From dwinsemius at comcast.net Tue May 3 03:14:51 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 2 May 2011 18:14:51 -0700 Subject: [R] Lasso with Categorical Variables In-Reply-To: References: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> Message-ID: <3AEC52F3-4A07-474C-9B04-0A339FCD791B@comcast.net> On May 2, 2011, at 2:22 PM, Clemontina Alexander wrote: > Thanks for your response, but I guess I didn't make my question clear. > I am already familiar with the concept of dummy variables and > regression in R. My question is, can the "lars" package (or some other > lasso algorithm) handle factors? The error message when you do so and the help page make it fairly clear that it does not. > I did use dummy variables in my > original data, but lars (lasso) only shrank the coefficients of some > of the levels of one factor to 0. You certainly gave no evidence that would lead anyone to think that you did so. Please try to understand that just converting factors to 'numeric' is not the same as creating dummy variables. -- David. > Is this the correct thing to do? > Because intuitively it seems like I would want to shrink the whole > factor coefficient to 0. If this is correct, what is the > interpretation? For example, for X1, if lasso drops the coefficient > for levels A and B, but not C and D, does this mean that X1 should be > included in the model? > Thanks. > > > > On Mon, May 2, 2011 at 2:47 PM, David Winsemius > wrote: >> >> On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: >> >>> Hi, >>> >>> On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander >> > >>> wrote: >>>> >>>> Hi! This is my first time posting. I've read the general rules and >>>> guidelines, but please bear with me if I make some fatal error in >>>> posting. Anyway, I have a continuous response and 29 predictors >>>> made >>>> up of continuous variables and nominal and ordinal categorical >>>> variables. I'd like to do lasso on these, but I get an error. The >>>> way >>>> I am using "lars" doesn't allow for the factors. Is there a special >>>> option or some other method in order to do lasso with cat. >>>> variables? >>>> >>>> Here is and example (considering ordinal variables as just >>>> nominal): >>>> >>>> set.seed(1) >>>> Y <- rnorm(10,0,1) >>>> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) >>>> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) >>>> X3 <- sample(x=30:55, size=10, replace=TRUE) # think age >>>> X4 <- rchisq(10, df=4, ncp=0) >>>> X <- data.frame(X1,X2,X3,X4) >>>> >>>>> str(X) >>>> >>>> 'data.frame': 10 obs. of 4 variables: >>>> $ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 >>>> $ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 >>>> $ X3: int 51 46 50 44 43 50 30 42 49 48 >>>> $ X4: num 2.86 1.55 1.94 2.45 2.75 ... >>>> >>>> >>>> I'd like to do: >>>> obj <- lars(x=X, y=Y, type = "lasso") >>>> >>>> Instead, what I have been doing is converting all data to >>>> continuous >>>> but I think this is really bad! >>> >>> Yeah, it is. >>> >>> Check out the "Categorical Predictor Variables" section here for a >>> way >>> to handle such predictor vars: >>> http://www.psychstat.missouristate.edu/multibook/mlt08m.html >> >> Steve's citation is somewhat helpful, but not sufficient to take >> the next >> steps. You can find details regarding the mechanics of typical linear >> regression in R on the ?lm page where you find that the factor >> variables are >> typically handled by model.matrix. See below: >> >>> model.matrix(~X1 + X2 + X3 + X4, X) >> (Intercept) X1B X1C X1D X2F X2G X2H X2I X3 X4 >> 1 1 0 0 1 0 1 0 0 51 2.8640884 >> 2 1 0 0 0 0 0 1 0 46 1.5462243 >> 3 1 0 1 0 0 1 0 0 50 1.9430901 >> 4 1 0 0 0 1 0 0 0 44 2.4504180 >> 5 1 1 0 0 0 0 0 1 43 2.7535052 >> 6 1 1 0 0 0 0 0 1 50 1.6200326 >> 7 1 0 0 0 0 0 0 1 30 0.5750533 >> 8 1 1 0 0 0 0 0 0 42 5.9224777 >> 9 1 0 0 1 0 0 0 1 49 2.0401528 >> 10 1 1 0 0 0 1 0 0 48 6.2995288 >> attr(,"assign") >> [1] 0 1 1 1 2 2 2 2 3 4 >> attr(,"contrasts") >> attr(,"contrasts")$X1 >> [1] "contr.treatment" >> >> attr(,"contrasts")$X2 >> [1] "contr.treatment" >> >> The numeric variables are passed through, while the dummy variables >> for >> factor columns are constructed (as treatment contrasts) and the >> whole thing >> it returned in a neat package. >> >> -- >> David. >>> >>> HTH, >>> -steve >>> >> -- >> David Winsemius, MD >> Heritage Laboratories >> West Hartford, CT >> >> David Winsemius, MD Heritage Laboratories West Hartford, CT From A.Robinson at ms.unimelb.edu.au Tue May 3 03:27:38 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Tue, 3 May 2011 11:27:38 +1000 Subject: [R] Lasso with Categorical Variables In-Reply-To: References: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> Message-ID: <20110503012738.GN48756@ms.unimelb.edu.au> On Mon, May 02, 2011 at 05:22:57PM -0400, Clemontina Alexander wrote: > Thanks for your response, but I guess I didn't make my question clear. > I am already familiar with the concept of dummy variables and > regression in R. My question is, can the "lars" package (or some other > lasso algorithm) handle factors? I did use dummy variables in my > original data, but lars (lasso) only shrank the coefficients of some > of the levels of one factor to 0. Is this the correct thing to do? It's because, so far as the linear model is concerned, factors are a convenience to help us handle the dummy variables. So, yes, it's the correct thing to do. It sounds to me as though you are after a shrinkage device that will treat the factor as a whole. > Because intuitively it seems like I would want to shrink the whole > factor coefficient to 0. If this is correct, what is the > interpretation? For example, for X1, if lasso drops the coefficient > for levels A and B, but not C and D, does this mean that X1 should be > included in the model? It means that X1 should be recoded to be C, D, and the rest. Cheers Andrew > Thanks. > > > > On Mon, May 2, 2011 at 2:47 PM, David Winsemius wrote: > > > > On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: > > > >> Hi, > >> > >> On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander > >> wrote: > >>> > >>> Hi! This is my first time posting. I've read the general rules and > >>> guidelines, but please bear with me if I make some fatal error in > >>> posting. Anyway, I have a continuous response and 29 predictors made > >>> up of continuous variables and nominal and ordinal categorical > >>> variables. I'd like to do lasso on these, but I get an error. The way > >>> I am using "lars" doesn't allow for the factors. Is there a special > >>> option or some other method in order to do lasso with cat. variables? > >>> > >>> Here is and example (considering ordinal variables as just nominal): > >>> > >>> set.seed(1) > >>> Y <- rnorm(10,0,1) > >>> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) > >>> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) > >>> X3 <- sample(x=30:55, size=10, replace=TRUE) ?# think age > >>> X4 <- rchisq(10, df=4, ncp=0) > >>> X <- data.frame(X1,X2,X3,X4) > >>> > >>>> str(X) > >>> > >>> 'data.frame': ? 10 obs. of ?4 variables: > >>> ?$ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 > >>> ?$ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 > >>> ?$ X3: int ?51 46 50 44 43 50 30 42 49 48 > >>> ?$ X4: num ?2.86 1.55 1.94 2.45 2.75 ... > >>> > >>> > >>> I'd like to do: > >>> obj <- lars(x=X, y=Y, type = "lasso") > >>> > >>> Instead, what I have been doing is converting all data to continuous > >>> but I think this is really bad! > >> > >> Yeah, it is. > >> > >> Check out the "Categorical Predictor Variables" section here for a way > >> to handle such predictor vars: > >> http://www.psychstat.missouristate.edu/multibook/mlt08m.html > > > > Steve's citation is somewhat helpful, but not sufficient to take the next > > steps. You can find details regarding the mechanics of typical linear > > regression in R on the ?lm page where you find that the factor variables are > > typically handled by model.matrix. See below: > > > >> model.matrix(~X1 + X2 + X3 + X4, X) > > ? (Intercept) X1B X1C X1D X2F X2G X2H X2I X3 ? ? ? ?X4 > > 1 ? ? ? ? ? ?1 ? 0 ? 0 ? 1 ? 0 ? 1 ? 0 ? 0 51 2.8640884 > > 2 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 ? 0 46 1.5462243 > > 3 ? ? ? ? ? ?1 ? 0 ? 1 ? 0 ? 0 ? 1 ? 0 ? 0 50 1.9430901 > > 4 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 1 ? 0 ? 0 ? 0 44 2.4504180 > > 5 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 43 2.7535052 > > 6 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 50 1.6200326 > > 7 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 30 0.5750533 > > 8 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 42 5.9224777 > > 9 ? ? ? ? ? ?1 ? 0 ? 0 ? 1 ? 0 ? 0 ? 0 ? 1 49 2.0401528 > > 10 ? ? ? ? ? 1 ? 1 ? 0 ? 0 ? 0 ? 1 ? 0 ? 0 48 6.2995288 > > attr(,"assign") > > ?[1] 0 1 1 1 2 2 2 2 3 4 > > attr(,"contrasts") > > attr(,"contrasts")$X1 > > [1] "contr.treatment" > > > > attr(,"contrasts")$X2 > > [1] "contr.treatment" > > > > The numeric variables are passed through, while the dummy variables for > > factor columns are constructed (as treatment contrasts) and the whole thing > > it returned in a neat package. > > > > -- > > David. > >> > >> HTH, > >> -steve > >> > > -- > > David Winsemius, MD > > Heritage Laboratories > > West Hartford, CT > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From rvaradhan at jhmi.edu Tue May 3 04:08:01 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Mon, 2 May 2011 22:08:01 -0400 Subject: [R] easy way to do a 2-D fit to an array of data? In-Reply-To: <4DBF3AB8.3050005@witthoft.com> References: <4DBF3AB8.3050005@witthoft.com> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F316@WIGGUMVS.win.ad.jhu.edu> You may want to consider spatial::surf.ls Or, a simplistic approach where you fit a model such as using `lm': E[Z | x, y] = a + b(x - x0)^2 + c(y - y0)^2 where (x0, y0) is the location of maximum. Ravi. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Carl Witthoft [carl at witthoft.com] Sent: Monday, May 02, 2011 7:14 PM To: r-help at r-project.org Subject: [R] easy way to do a 2-D fit to an array of data? Hi, I've got a matrix, Z, of values representing (as it happens) optical power at each pixel location. Since I know in advance I've got a single, convex peak, I would like to do a 2D parabolic fit of the form Z = poly((x+y),2) where x and y are the x,y coordinates of each pixel (or equivalently, the row, column numbers). Is there an R function that lets me easily implement that? I've started down the path of something like zvec <- as.vector(Z), and creating applicable x,y vectors by something like (where for the sake of argument Z is 128x128) foo<-matrix(seq(1,128),128,128) xvec <- as.vector(foo) yvec <- as.vector(t(foo)) at which point I can feed zvec, xvec, yvec to lm() . I'm hopeful someone can point me to a much easier way to do the same thing. Oh, and if there's a 2-D splinefunction generator, that would work for me as well. thanks Carl ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From patrick.breheny at uky.edu Tue May 3 04:26:12 2011 From: patrick.breheny at uky.edu (Breheny, Patrick) Date: Mon, 2 May 2011 22:26:12 -0400 Subject: [R] Lasso with Categorical Variables In-Reply-To: References: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net>, Message-ID: <408338F86F0D4243BD5E7B74A8C0862B205670D135@EX7FM03.ad.uky.edu> Clementonia, It sounds like you are looking for the group lasso (Yuan & Lin, 2006). There are two packages on CRAN that have implemented this idea: grpreg and grplasso. The syntax of each is similar to lars (in particular requiring a numeric design matrix as produced by model.matrix), except you must also supply a vector that describes the grouping (e.g., c(1,1,1,2,2,3,3,...)). The members of each group will then either be all zero or all nonzero (i.e., the variable selection occurs at the group level). _______________________ Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Clemontina Alexander [ckalexa2 at ncsu.edu] Sent: Monday, May 02, 2011 5:22 PM To: David Winsemius Cc: r-help at r-project.org Subject: Re: [R] Lasso with Categorical Variables Thanks for your response, but I guess I didn't make my question clear. I am already familiar with the concept of dummy variables and regression in R. My question is, can the "lars" package (or some other lasso algorithm) handle factors? I did use dummy variables in my original data, but lars (lasso) only shrank the coefficients of some of the levels of one factor to 0. Is this the correct thing to do? Because intuitively it seems like I would want to shrink the whole factor coefficient to 0. If this is correct, what is the interpretation? For example, for X1, if lasso drops the coefficient for levels A and B, but not C and D, does this mean that X1 should be included in the model? Thanks. On Mon, May 2, 2011 at 2:47 PM, David Winsemius wrote: > > On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: > >> Hi, >> >> On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander >> wrote: >>> >>> Hi! This is my first time posting. I've read the general rules and >>> guidelines, but please bear with me if I make some fatal error in >>> posting. Anyway, I have a continuous response and 29 predictors made >>> up of continuous variables and nominal and ordinal categorical >>> variables. I'd like to do lasso on these, but I get an error. The way >>> I am using "lars" doesn't allow for the factors. Is there a special >>> option or some other method in order to do lasso with cat. variables? >>> >>> Here is and example (considering ordinal variables as just nominal): >>> >>> set.seed(1) >>> Y <- rnorm(10,0,1) >>> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) >>> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) >>> X3 <- sample(x=30:55, size=10, replace=TRUE) # think age >>> X4 <- rchisq(10, df=4, ncp=0) >>> X <- data.frame(X1,X2,X3,X4) >>> >>>> str(X) >>> >>> 'data.frame': 10 obs. of 4 variables: >>> $ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 >>> $ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 >>> $ X3: int 51 46 50 44 43 50 30 42 49 48 >>> $ X4: num 2.86 1.55 1.94 2.45 2.75 ... >>> >>> >>> I'd like to do: >>> obj <- lars(x=X, y=Y, type = "lasso") >>> >>> Instead, what I have been doing is converting all data to continuous >>> but I think this is really bad! >> >> Yeah, it is. >> >> Check out the "Categorical Predictor Variables" section here for a way >> to handle such predictor vars: >> http://www.psychstat.missouristate.edu/multibook/mlt08m.html > > Steve's citation is somewhat helpful, but not sufficient to take the next > steps. You can find details regarding the mechanics of typical linear > regression in R on the ?lm page where you find that the factor variables are > typically handled by model.matrix. See below: > >> model.matrix(~X1 + X2 + X3 + X4, X) > (Intercept) X1B X1C X1D X2F X2G X2H X2I X3 X4 > 1 1 0 0 1 0 1 0 0 51 2.8640884 > 2 1 0 0 0 0 0 1 0 46 1.5462243 > 3 1 0 1 0 0 1 0 0 50 1.9430901 > 4 1 0 0 0 1 0 0 0 44 2.4504180 > 5 1 1 0 0 0 0 0 1 43 2.7535052 > 6 1 1 0 0 0 0 0 1 50 1.6200326 > 7 1 0 0 0 0 0 0 1 30 0.5750533 > 8 1 1 0 0 0 0 0 0 42 5.9224777 > 9 1 0 0 1 0 0 0 1 49 2.0401528 > 10 1 1 0 0 0 1 0 0 48 6.2995288 > attr(,"assign") > [1] 0 1 1 1 2 2 2 2 3 4 > attr(,"contrasts") > attr(,"contrasts")$X1 > [1] "contr.treatment" > > attr(,"contrasts")$X2 > [1] "contr.treatment" > > The numeric variables are passed through, while the dummy variables for > factor columns are constructed (as treatment contrasts) and the whole thing > it returned in a neat package. > > -- > David. >> >> HTH, >> -steve >> > -- > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From mbmiller+l at gmail.com Tue May 3 04:32:03 2011 From: mbmiller+l at gmail.com (Mike Miller) Date: Mon, 2 May 2011 21:32:03 -0500 Subject: [R] UNIX-like "cut" command in R In-Reply-To: <20110502223523.GJ48756@ms.unimelb.edu.au> References: <20110502223523.GJ48756@ms.unimelb.edu.au> Message-ID: On Tue, 3 May 2011, Andrew Robinson wrote: > try substr() OK. Apparently, it allows things like this... > substr("abcdef",2,4) [1] "bcd" ...which is like this: echo "abcdef" | cut -c2-4 But that doesn't use a delimiter, it only does character-based cutting, and it is very limited. With "cut -c" I can do stuff this: echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- abclmnoqrstuvwxyz It extracts characters 1 to 3, 12 to 15 and 17 to the end. That was a great tip, though, because it led me to strsplit, which can do what I want, however somewhat awkwardly: > y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" > paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim) [1] "a b c l m n o q r s t u v w x y z" That gives me what I want, but it is still a little awkward. I guess I don't quite get what I'm doing with lists. I'm not clear on how this would work with a vector of strings. Mike From ggrothendieck at gmail.com Tue May 3 05:23:15 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Mon, 2 May 2011 23:23:15 -0400 Subject: [R] UNIX-like "cut" command in R In-Reply-To: References: <20110502223523.GJ48756@ms.unimelb.edu.au> Message-ID: On Mon, May 2, 2011 at 10:32 PM, Mike Miller wrote: > On Tue, 3 May 2011, Andrew Robinson wrote: > >> try substr() > > OK. ?Apparently, it allows things like this... > >> substr("abcdef",2,4) > > [1] "bcd" > > ...which is like this: > > echo "abcdef" | cut -c2-4 > > But that doesn't use a delimiter, it only does character-based cutting, and > it is very limited. ?With "cut -c" I can do stuff this: > > echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- > > abclmnoqrstuvwxyz > > It extracts characters 1 to 3, 12 to 15 and 17 to the end. > > That was a great tip, though, because it led me to strsplit, which can do > what I want, however somewhat awkwardly: > >> y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" >> paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim) > > [1] "a b c l m n o q r s t u v w x y z" > > That gives me what I want, but it is still a little awkward. ?I guess I > don't quite get what I'm doing with lists. ?I'm not clear on how this would > work with a vector of strings. > Try this: > read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL")) V1 V3 V5 1 abc lmno qrstuvwxyz -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From rvaradhan at jhmi.edu Tue May 3 05:55:18 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Mon, 2 May 2011 23:55:18 -0400 Subject: [R] easy way to do a 2-D fit to an array of data? In-Reply-To: <4DBF3AB8.3050005@witthoft.com> References: <4DBF3AB8.3050005@witthoft.com> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F318@WIGGUMVS.win.ad.jhu.edu> Hi Carl, Here is another slightly different (not necessarily the easiest) approach that uses a profiling technique. An advantage is that you get the maximum location directly. n <- 20 x <- sort(rnorm(n)) y <- sort(rnorm(n)) xy <- expand.grid(x, y) zfn <- function(x) 0.5 - 2.2 * (x[1] - 0.5)^2 - 0.9 * (x[2] + 0.5)^2 z <- rep(NA, length=n^2) for (i in 1:nrow(xy)) z[i] <- zfn(xy[i, ]) z <- z + rnorm(n^2, sd=0.3) obj <- function(par, x, y, z) { -summary(lm(z ~ I((x - par[1])^2) + I((y - par[2])^2)))$r.sq } require(dfoptim) ans <- nmk(par=colMeans(xy), fn=obj, x=xy[,1], y=xy[,2], z=z) ans$par # location of the maximum summary(lm(z ~ I((xy[,1] - ans$par[1])^2) + I((xy[,2] - ans$par[2])^2))) Ravi. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Carl Witthoft [carl at witthoft.com] Sent: Monday, May 02, 2011 7:14 PM To: r-help at r-project.org Subject: [R] easy way to do a 2-D fit to an array of data? Hi, I've got a matrix, Z, of values representing (as it happens) optical power at each pixel location. Since I know in advance I've got a single, convex peak, I would like to do a 2D parabolic fit of the form Z = poly((x+y),2) where x and y are the x,y coordinates of each pixel (or equivalently, the row, column numbers). Is there an R function that lets me easily implement that? I've started down the path of something like zvec <- as.vector(Z), and creating applicable x,y vectors by something like (where for the sake of argument Z is 128x128) foo<-matrix(seq(1,128),128,128) xvec <- as.vector(foo) yvec <- as.vector(t(foo)) at which point I can feed zvec, xvec, yvec to lm() . I'm hopeful someone can point me to a much easier way to do the same thing. Oh, and if there's a 2-D splinefunction generator, that would work for me as well. thanks Carl ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From mbmiller+l at gmail.com Tue May 3 06:26:59 2011 From: mbmiller+l at gmail.com (Mike Miller) Date: Mon, 2 May 2011 23:26:59 -0500 Subject: [R] UNIX-like "cut" command in R In-Reply-To: References: <20110502223523.GJ48756@ms.unimelb.edu.au> Message-ID: On Mon, 2 May 2011, Gabor Grothendieck wrote: > On Mon, May 2, 2011 at 10:32 PM, Mike Miller wrote: >> On Tue, 3 May 2011, Andrew Robinson wrote: >> >>> try substr() >> >> OK. ?Apparently, it allows things like this... >> >>> substr("abcdef",2,4) >> >> [1] "bcd" >> >> ...which is like this: >> >> echo "abcdef" | cut -c2-4 >> >> But that doesn't use a delimiter, it only does character-based cutting, and >> it is very limited. ?With "cut -c" I can do stuff this: >> >> echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- >> >> abclmnoqrstuvwxyz >> >> It extracts characters 1 to 3, 12 to 15 and 17 to the end. >> >> That was a great tip, though, because it led me to strsplit, which can do >> what I want, however somewhat awkwardly: >> >>> y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" >>> paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim) >> >> [1] "a b c l m n o q r s t u v w x y z" >> >> That gives me what I want, but it is still a little awkward. ?I guess I >> don't quite get what I'm doing with lists. ?I'm not clear on how this would >> work with a vector of strings. >> > > Try this: > >> read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL")) > V1 V3 V5 > 1 abc lmno qrstuvwxyz That gives me a few more functions to study. Of course the new code (using read.fwf() and textConnection()) is not doing what was requested and it requires some work to compute the widths from the given numbers (c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)). Mike From hcatbr at yahoo.co.in Tue May 3 06:43:11 2011 From: hcatbr at yahoo.co.in (HC) Date: Mon, 2 May 2011 21:43:11 -0700 (PDT) Subject: [R] adaptIntegrate - how to pass additional parameters to the integrand Message-ID: <1304397791609-3491701.post@n4.nabble.com> Hello, I am trying to use adaptIntegrate function but I need to pass on a few additional parameters to the integrand. However, this function seems not to have the flexibility of passing on such additional parameters. Am I missing something or this is a known limitation. Is there a good alternative to such restrictions, if there at all are? Many thanks for your time. HC -- View this message in context: http://r.789695.n4.nabble.com/adaptIntegrate-how-to-pass-additional-parameters-to-the-integrand-tp3491701p3491701.html Sent from the R help mailing list archive at Nabble.com. From ehlers at ucalgary.ca Tue May 3 07:56:15 2011 From: ehlers at ucalgary.ca (P Ehlers) Date: Mon, 2 May 2011 22:56:15 -0700 Subject: [R] UNIX-like "cut" command in R In-Reply-To: References: <20110502223523.GJ48756@ms.unimelb.edu.au> Message-ID: <4DBF98FF.3030204@ucalgary.ca> Mike Miller wrote: > On Mon, 2 May 2011, Gabor Grothendieck wrote: > >> On Mon, May 2, 2011 at 10:32 PM, Mike Miller wrote: >>> On Tue, 3 May 2011, Andrew Robinson wrote: >>> >>>> try substr() >>> OK. Apparently, it allows things like this... >>> >>>> substr("abcdef",2,4) >>> [1] "bcd" >>> >>> ...which is like this: >>> >>> echo "abcdef" | cut -c2-4 >>> >>> But that doesn't use a delimiter, it only does character-based cutting, and >>> it is very limited. With "cut -c" I can do stuff this: >>> >>> echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- >>> >>> abclmnoqrstuvwxyz >>> >>> It extracts characters 1 to 3, 12 to 15 and 17 to the end. >>> >>> That was a great tip, though, because it led me to strsplit, which can do >>> what I want, however somewhat awkwardly: >>> >>>> y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" >>>> paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim) >>> [1] "a b c l m n o q r s t u v w x y z" >>> >>> That gives me what I want, but it is still a little awkward. I guess I >>> don't quite get what I'm doing with lists. I'm not clear on how this would >>> work with a vector of strings. >>> >> Try this: >> >>> read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL")) >> V1 V3 V5 >> 1 abc lmno qrstuvwxyz > > > That gives me a few more functions to study. Of course the new code > (using read.fwf() and textConnection()) is not doing what was requested > and it requires some work to compute the widths from the given numbers > (c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)). > > Mike Use str_sub() in the stringr package: require(stringr) # install first if necessary s <- "abcdefghijklmnopqrstuvwxyz" str_sub(s, c(1,12,17), c(3,15,-1)) #[1] "abc" "lmno" "qrstuvwxyz" Peter Ehlers From mbmiller+l at gmail.com Tue May 3 08:04:50 2011 From: mbmiller+l at gmail.com (Mike Miller) Date: Tue, 3 May 2011 01:04:50 -0500 Subject: [R] UNIX-like "cut" command in R In-Reply-To: <4DBF98FF.3030204@ucalgary.ca> References: <20110502223523.GJ48756@ms.unimelb.edu.au> <4DBF98FF.3030204@ucalgary.ca> Message-ID: On Mon, 2 May 2011, P Ehlers wrote: > Use str_sub() in the stringr package: > > require(stringr) # install first if necessary > s <- "abcdefghijklmnopqrstuvwxyz" > > str_sub(s, c(1,12,17), c(3,15,-1)) > #[1] "abc" "lmno" "qrstuvwxyz" Thanks. That's very close to what I'm looking for, but it seems to correspond to "cut -c", not to "cut -f". Can it work with delimiters or only with character counts? Mike From chschulz at email.de Tue May 3 08:28:10 2011 From: chschulz at email.de (Christian Schulz) Date: Tue, 03 May 2011 08:28:10 +0200 Subject: [R] UNIX-like "cut" command in R In-Reply-To: References: <20110502223523.GJ48756@ms.unimelb.edu.au> <4DBF98FF.3030204@ucalgary.ca> Message-ID: <4DBFA07A.6070102@email.de> > On Mon, 2 May 2011, P Ehlers wrote: > >> Use str_sub() in the stringr package: >> >> require(stringr) # install first if necessary >> s <- "abcdefghijklmnopqrstuvwxyz" >> >> str_sub(s, c(1,12,17), c(3,15,-1)) >> #[1] "abc" "lmno" "qrstuvwxyz" > > > Thanks. That's very close to what I'm looking for, but it seems to > correspond to "cut -c", not to "cut -f". Can it work with delimiters > or only with character counts? > > Mike > x <- "this is a string" unlist(strsplit(x," "))[c(1,4)] HTH Christian > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Anna-Leena.Orsama at uta.fi Tue May 3 08:09:20 2011 From: Anna-Leena.Orsama at uta.fi (Anna-Leena Orsama) Date: Tue, 03 May 2011 09:09:20 +0300 Subject: [R] Comparison of two penalized spline fits in mixed model framework Message-ID: <20110503090920.29189soi4eyxtij4@imp3.uta.fi> Hello! I have faced a problem in nlme-environment. My intention is to fit a penalized spline model in mixed model framework. I want make a comparison in smooth curves between two groups but for some reason I get NaN in output.. Hereis the R-code I have used. #Z.overall is for truncated lines part knots <- seq(1.5,6.5,by=1) Z <- outer(weekday,knots,"-") Z.overall <- Z*(Z>0) fit1 <- lme(weight~group*weekday, random=list(group=pdIdent(~Z.overall-1))) summary(fit1) Linear mixed-effects model fit by REML Data: NULL AIC BIC logLik 4379.838 4411.895 -2183.919 Random effects: Formula: ~Z.overall - 1 | group Structure: Multiple of an Identity Z.overall1 Z.overall2 Z.overall3 Z.overall4 Z.overall5 Z.overall6 Residual StdDev: 0.1310617 0.1310617 0.1310617 0.1310617 0.1310617 0.1310617 0.9833188 Fixed effects: normMAweight2 ~ group * weekday Value Std.Error DF t-value p-value (Intercept) 0.23589909 0.4431604 1545 0.5323109 0.5946 group 0.14167744 0.2629714 0 0.5387562 NaN weekday -0.08601980 0.2770228 1545 -0.3105152 0.7562 group:weekday -0.02575775 0.1686222 1545 -0.1527542 0.8786 Correlation: (Intr) group weekdy group -0.957 weekday -0.911 0.883 group:weekday 0.861 -0.915 -0.953 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.87210364 -0.64218983 -0.03839294 0.60534552 4.30615258 Number of Observations: 1549 Number of Groups: 2 Warning message: In pt(q, df, lower.tail, log.p) : NaNs produced Kindly Regards, Anna-Leena Orsama From frank.lehmann62 at freenet.de Tue May 3 08:17:13 2011 From: frank.lehmann62 at freenet.de (Frank Lehmann) Date: Tue, 3 May 2011 08:17:13 +0200 Subject: [R] problem with Sweave and pdflatex In-Reply-To: <4DBED318.7070808@statistik.tu-dortmund.de> References: <000301cc08c7$b4c2ef50$1e48cdf0$@lehmann62@freenet.de> <4DBED318.7070808@statistik.tu-dortmund.de> Message-ID: <000901cc0959$be50afc0$3af20f40$@lehmann62@freenet.de> That might fix the problem... I will test it. Thanks! Frank -----Urspr?ngliche Nachricht----- Von: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] Gesendet: Montag, 2. Mai 2011 17:52 An: Frank Lehmann Cc: r-help at r-project.org Betreff: Re: [R] problem with Sweave and pdflatex Have you checked the permissions in the working directory? Is there a blank in your path (LaTeX does not like spaces in the path). Uwe Ligges On 02.05.2011 14:51, Frank Lehmann wrote: > Hallo, > > > > when I plot figures with Sweave, I get the message "pdflatex: Permission > denied". This problem only occurs while working on local system. When I copy > the *.rnw-File to my AFS drive, there is no problem at all. > > > > Here is a small example: > > > > \documentclass{scrartcl} > > \usepackage[OT1]{fontenc} > > \usepackage[latin1]{inputenc} > > \usepackage[ngerman]{babel} > > \usepackage[pdftex]{graphicx} > > \usepackage{Sweave} > > > > \begin{document} > > > > \setkeys{Gin}{width=\textwidth} > > \begin{figure}[htbp] > > <>= > > x<- 1:10 > > plot(x) > > @ > > \caption{Eine einfache Grafik} > > \end{figure} > > > > \end{document} > > > > Does anyone have an idea, how to solve that problem? Im working with Windows > XP. > > > > Thanks! > > > > Frank > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From mbmiller+l at gmail.com Tue May 3 08:39:49 2011 From: mbmiller+l at gmail.com (Mike Miller) Date: Tue, 3 May 2011 01:39:49 -0500 (CDT) Subject: [R] UNIX-like "cut" command in R In-Reply-To: <4DBFA07A.6070102@email.de> References: <20110502223523.GJ48756@ms.unimelb.edu.au> <4DBF98FF.3030204@ucalgary.ca> <4DBFA07A.6070102@email.de> Message-ID: On Tue, 3 May 2011, Christian Schulz wrote: >> On Mon, 2 May 2011, P Ehlers wrote: >> >>> Use str_sub() in the stringr package: >>> >>> require(stringr) # install first if necessary >>> s <- "abcdefghijklmnopqrstuvwxyz" >>> >>> str_sub(s, c(1,12,17), c(3,15,-1)) >>> #[1] "abc" "lmno" "qrstuvwxyz" >> >> >> Thanks. That's very close to what I'm looking for, but it seems to >> correspond to "cut -c", not to "cut -f". Can it work with delimiters or >> only with character counts? >> >> Mike >> > > x <- "this is a string" > unlist(strsplit(x," "))[c(1,4)] Thanks. I did figure that one out a couple of messages back, but to get it do behave like "cut -d' ' -f1,4", I had to add a paste command to reassemble the parts: paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ") Then I wasn't sure if I could do this to every element of a vector of strings without looping -- I have to think not. Mike From khosoda at med.kobe-u.ac.jp Tue May 3 08:40:11 2011 From: khosoda at med.kobe-u.ac.jp (khosoda at med.kobe-u.ac.jp) Date: Tue, 03 May 2011 15:40:11 +0900 Subject: [R] Bootstrapping confidence intervals Message-ID: <4DBFA34B.8070607@med.kobe-u.ac.jp> Hi, Sorry for repeated question. I performed logistic regression using lrm and penalized it with pentrace function. I wanted to get confidence intervals of odds ratio of each predictor and summary(MyModel) gave them. I also tried to get bootstrapping standard errors in the logistic regression. bootcov function in rms package provided them. Then, I found that the confidence intervals provided by bootstrapping (bootcov) was narrower than CIs provided by usual variance-covariance matrix in the followings. My data has no cluster structure. I am wondering which confidence interval is better. I guess bootstrapping one, but is it right? I would appreciate anybody's help in advance. > summary(MyModel, stenosis=c(70, 80), x1=c(1.5, 2.0), x2=c(1.5, 2.0)) Effects Response : outcome Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95 stenosis 70.0 80 10.0 -0.11 0.24 -0.59 0.37 Odds Ratio 70.0 80 10.0 0.90 NA 0.56 1.45 x1 1.5 2 0.5 1.21 0.37 0.49 1.94 Odds Ratio 1.5 2 0.5 3.36 NA 1.63 6.95 x2 1.5 2 0.5 -0.29 0.19 -0.65 0.08 Odds Ratio 1.5 2 0.5 0.75 NA 0.52 1.08 ClinicalScore 3.0 5 2.0 0.61 0.38 -0.14 1.36 Odds Ratio 3.0 5 2.0 1.84 NA 0.87 3.89 procedure - CA:CE 2.0 1 NA 0.83 0.46 -0.07 1.72 Odds Ratio 2.0 1 NA 2.28 NA 0.93 5.59 > summary(MyModel.boot, stenosis=c(70, 80), x1=c(1.5, 2.0), x2=c(1.5, 2.0)) Effects Response : outcome Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95 stenosis 70.0 80 10.0 -0.11 0.28 -0.65 0.43 Odds Ratio 70.0 80 10.0 0.90 NA 0.52 1.54 x1 1.5 2 0.5 1.21 0.29 0.65 1.77 Odds Ratio 1.5 2 0.5 3.36 NA 1.92 5.89 x2 1.5 2 0.5 -0.29 0.16 -0.59 0.02 Odds Ratio 1.5 2 0.5 0.75 NA 0.55 1.02 ClinicalScore 3.0 5 2.0 0.61 0.45 -0.28 1.50 Odds Ratio 3.0 5 2.0 1.84 NA 0.76 4.47 procedure - CAS:CEA 2.0 1 NA 0.83 0.38 0.07 1.58 Odds Ratio 2.0 1 NA 2.28 NA 1.08 4.85 From nick.sabbe at ugent.be Tue May 3 08:40:55 2011 From: nick.sabbe at ugent.be (Nick Sabbe) Date: Tue, 3 May 2011 08:40:55 +0200 Subject: [R] Lasso with Categorical Variables In-Reply-To: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> References: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> Message-ID: <02e401cc095d$0e05ee60$2a11cb20$@sabbe@ugent.be> For performance reasons, I advise on using the following function instead of model.matrix: factorsToDummyVariables<-function(dfr, betweenColAndLevel="") { nc<-dim(dfr)[2] firstRow<-dfr[1,] coln<-colnames(dfr) retval<-do.call(cbind, lapply(seq(nc), function(ci){ if(is.factor(firstRow[,ci])) { lvls<-levels(firstRow[,ci])[-1] stretchedcols<-sapply(lvls, function(lvl){ rv<-dfr[,ci]==lvl mode(rv)<-"integer" return(rv) }) if(!is.matrix(stretchedcols)) stretchedcols<-matrix(stretchedcols, nrow=1) colnames(stretchedcols)<-paste(coln[ci], lvls, sep=betweenColAndLevel) return(stretchedcols) } else { curcol<-matrix(dfr[,ci], ncol=1) colnames(curcol)<-coln[ci] return(curcol) } })) rownames(retval)<-rownames(dfr) return(retval) } Just for comparison: here is my old version of the same function, using model.matrix: factorsToDummyVariables.old<-function(dfrPredictors, form=paste("~",paste(colnames(dfrPredictors), collapse="+"), sep="")) { #note: this function seems to operate quite slowly! #Because it is used often, it may be worth improving its speed dfrTmp<-model.frame(dfrPredictors, na.action=na.pass) frm<-as.formula(form) mm<-model.matrix(frm, data=dfrTmp) retval<-as.matrix(mm)[,-1] return(retval) } In a testcase with a reasonably big dataset, I compared the speeds: #system.time(tmp.fd.convds.full.man<-manualFactorsToDummyVariables(ds)) ## user system elapsed ## 9.44 0.00 9.48 #system.time(tmp.fd.convds.full<-factorsToDummyVariables.old(ds)) ## user system elapsed ## 15.49 0.00 15.64 #system.time(invisible(factorsToDummyVariables (ds[10,]))) ## user system elapsed ## 0.36 0.00 0.36 #system.time(invisible(factorsToDummyVariables.old (ds[10,]))) ## user system elapsed ## 2.18 0.00 2.20 #system.time(invisible(factorsToDummyVariables (ds[20:30,]))) ## user system elapsed ## 0.34 0.00 0.38 #system.time(invisible(factorsToDummyVariables.old (ds[20:30,]))) ## user system elapsed ## 2.11 0.00 2.15 If you have to do this quite often, the difference surely adds up... More improvements may be possible. This function only works if you don't include interactions, though. Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: maandag 2 mei 2011 20:48 To: Steve Lianoglou Cc: r-help at r-project.org Subject: Re: [R] Lasso with Categorical Variables On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: > Hi, > > On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander > wrote: >> Hi! This is my first time posting. I've read the general rules and >> guidelines, but please bear with me if I make some fatal error in >> posting. Anyway, I have a continuous response and 29 predictors made >> up of continuous variables and nominal and ordinal categorical >> variables. I'd like to do lasso on these, but I get an error. The way >> I am using "lars" doesn't allow for the factors. Is there a special >> option or some other method in order to do lasso with cat. variables? >> >> Here is and example (considering ordinal variables as just nominal): >> >> set.seed(1) >> Y <- rnorm(10,0,1) >> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) >> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) >> X3 <- sample(x=30:55, size=10, replace=TRUE) # think age >> X4 <- rchisq(10, df=4, ncp=0) >> X <- data.frame(X1,X2,X3,X4) >> >>> str(X) >> 'data.frame': 10 obs. of 4 variables: >> $ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 >> $ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 >> $ X3: int 51 46 50 44 43 50 30 42 49 48 >> $ X4: num 2.86 1.55 1.94 2.45 2.75 ... >> >> >> I'd like to do: >> obj <- lars(x=X, y=Y, type = "lasso") >> >> Instead, what I have been doing is converting all data to continuous >> but I think this is really bad! > > Yeah, it is. > > Check out the "Categorical Predictor Variables" section here for a way > to handle such predictor vars: > http://www.psychstat.missouristate.edu/multibook/mlt08m.html Steve's citation is somewhat helpful, but not sufficient to take the next steps. You can find details regarding the mechanics of typical linear regression in R on the ?lm page where you find that the factor variables are typically handled by model.matrix. See below: > model.matrix(~X1 + X2 + X3 + X4, X) (Intercept) X1B X1C X1D X2F X2G X2H X2I X3 X4 1 1 0 0 1 0 1 0 0 51 2.8640884 2 1 0 0 0 0 0 1 0 46 1.5462243 3 1 0 1 0 0 1 0 0 50 1.9430901 4 1 0 0 0 1 0 0 0 44 2.4504180 5 1 1 0 0 0 0 0 1 43 2.7535052 6 1 1 0 0 0 0 0 1 50 1.6200326 7 1 0 0 0 0 0 0 1 30 0.5750533 8 1 1 0 0 0 0 0 0 42 5.9224777 9 1 0 0 1 0 0 0 1 49 2.0401528 10 1 1 0 0 0 1 0 0 48 6.2995288 attr(,"assign") [1] 0 1 1 1 2 2 2 2 3 4 attr(,"contrasts") attr(,"contrasts")$X1 [1] "contr.treatment" attr(,"contrasts")$X2 [1] "contr.treatment" The numeric variables are passed through, while the dummy variables for factor columns are constructed (as treatment contrasts) and the whole thing it returned in a neat package. -- David. > > HTH, > -steve > -- David Winsemius, MD Heritage Laboratories West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From fomcl at yahoo.com Tue May 3 09:45:15 2011 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Tue, 3 May 2011 00:45:15 -0700 (PDT) Subject: [R] Rodbc quesion: how to reliably determine the data type? Message-ID: <934521.97792.qm@web110705.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From savicky at praha1.ff.cuni.cz Tue May 3 10:03:29 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Tue, 3 May 2011 10:03:29 +0200 Subject: [R] UNIX-like "cut" command in R In-Reply-To: References: <20110502223523.GJ48756@ms.unimelb.edu.au> <4DBF98FF.3030204@ucalgary.ca> <4DBFA07A.6070102@email.de> Message-ID: <20110503080328.GA30013@praha1.ff.cuni.cz> On Tue, May 03, 2011 at 01:39:49AM -0500, Mike Miller wrote: > On Tue, 3 May 2011, Christian Schulz wrote: > [...] > > > >x <- "this is a string" > >unlist(strsplit(x," "))[c(1,4)] > > > Thanks. I did figure that one out a couple of messages back, but to get > it do behave like "cut -d' ' -f1,4", I had to add a paste command to > reassemble the parts: > > paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ") > > Then I wasn't sure if I could do this to every element of a vector of > strings without looping -- I have to think not. Try the following x <- c("this is a string", "this is a numeric") reassemble <- function(x, ind) paste(x[ind], collapse=" ") vapply(strsplit(x," "), reassemble, "character", c(1, 4)) [1] "this string" "this numeric" Hope this helps. Petr Savicky. From Stefan.Hoj-Edwards at agrsci.dk Tue May 3 09:12:01 2011 From: Stefan.Hoj-Edwards at agrsci.dk (=?utf-8?B?U3RlZmFuIE1jS2lubm9uIEjDuGotRWR3YXJkcw==?=) Date: Tue, 3 May 2011 09:12:01 +0200 Subject: [R] Problems with Rterm 2.13.0 - but not RGui In-Reply-To: References: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> Message-ID: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F6B3@DJFEXMBX01.djf.agrsci.dk> Yes, the message is pretty clear, but it has nothing to do with running as admin. I have just tried to start a command line with admin privileges and the error still occurs. Regarding Rgui, I started it by opening the shortcut. Now I've tracked down the problem a bit, and the problem appears to be connected to which folder R is called from. And by sheer luck I've resolved the problem: In all previous versions of Windows, on the Danish editions, the "C:\Program Files" directory was called "C:\Programmer". This appears to be the case in Windows 7, but "C:\Programmer" is a symbolic link (hard/soft?) to "C:\Program Files". And apparently, I've been calling R from "C:\Programmer" instead of "C:\Program Files" which gave the problem. When/how/why I changed the PATH variable to the symbolic link is unclear, but a quick check reveals that the problem did not exist in R 2.12.1: C:\Programmer\R\R-2.12.1\bin\i386\Rterm # No problem C:\Programmer\R\R-2.13.0\bin\i386\Rterm # Problem I will submit a bug report on this. Kind regards, Stefan McKinnon Edwards -----Oprindelig meddelelse----- Fra: Jonathan Daily [mailto:biomathjdaily at gmail.com] Sendt: 2. maj 2011 16:59 Til: Stefan McKinnon H?j-Edwards Cc: r-help at r-project.org Emne: Re: [R] Problems with Rterm 2.13.0 - but not RGui The message is pretty clear. Access denied means you don't have permission to access the path. This also explains why the packages fail to load - you don't have access to R's package library. It most likely works on RGui because you are clicking it/running it as admin (you did not specify how you ran RGui). 2011/5/2 Stefan McKinnon H?j-Edwards : > Hi all, > > I have just installed R 2.13.0 and I am experiencing problems with the terminal, but not the with the GUI interface. > I am Windows 7. > > When running "R" or "Rterm" from a commandline I receive the following: > > Warning message: > In normalizePath(path.expand(path), winslash, mustWork) : > ?path[3]="C:/Programmer/R/R-2.13.0/library": Adgang n?gtet > > R version 2.13.0 (2011-04-13) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: i386-pc-mingw32/i386 (32-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > Warning message: > package "methods" in options("defaultPackages") was not found > During startup - Warning messages: > 1: package 'datasets' in options("defaultPackages") was not found > 2: package 'utils' in options("defaultPackages") was not found > 3: package 'grDevices' in options("defaultPackages") was not found > 4: package 'graphics' in options("defaultPackages") was not found > 5: package 'stats' in options("defaultPackages") was not found > 6: package 'methods' in options("defaultPackages") was not found > > > Notice: "C:/Programmer/" is the Danish equivalent of "C:/Program Files". > The first error "Adgang n?gtet" is directly translated to "Access denied". > > Any suggestions as how to fix this? > > Kind regards, > Stefan McKinnon Edwards > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From jdnewmil at dcn.davis.ca.us Tue May 3 10:21:02 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Tue, 03 May 2011 01:21:02 -0700 Subject: [R] Rodbc quesion: how to reliably determine the data type? In-Reply-To: <934521.97792.qm@web110705.mail.gq1.yahoo.com> References: <934521.97792.qm@web110705.mail.gq1.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jeanpaul.ebejer at inhibox.com Tue May 3 10:33:25 2011 From: jeanpaul.ebejer at inhibox.com (JP) Date: Tue, 3 May 2011 09:33:25 +0100 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: <4DBE6844.1090009@statistik.tu-dortmund.de> References: <4DBE6844.1090009@statistik.tu-dortmund.de> Message-ID: Thanks Uwe, How do I calculate the Z score and r value - please (once I have the p values)? Many Thanks JP 2011/5/2 Uwe Ligges : > To get the statsitics, you will have to run each wilcox.test ?manually. the > pairwise... version just extracts the p-values and adjusts them. > > Uwe Ligges > > > On 28.04.2011 15:18, JP wrote: >> >> Hi there, >> >> I am trying to do multiple pairwise Wilcoxon signed rank tests in a >> manner similar to: >> >> a<- c(runif(1000, min=1,max=50), rnorm(1000, 50), rnorm(1000, 49.9, >> 0.5), rgeom(1000, 0.5)) >> b<- c(rep("group_a", 1000), rep("group_b", 1000), rep("group_c", >> 1000), rep("group_d", 1000)) >> pairwise.wilcox.test(a, b, alternative="two.sided", >> p.adj="bonferroni", exact=F, paired=T) >> >> This gives me the following output: >> >> ? ? ? ? group_a group_b group_c >> group_b<2e-16 ?- ? ? ? - >> group_c<2e-16 ?0.25 ? ?- >> group_d<2e-16<2e-16<2e-16 >> >> (which is kind of expected since group_b and group_c have similar >> distributions) >> >> I have found that when doing a wilcoxon signed ranked test you should >> report: >> >> - The median value (and not the mean or sd, presumably because of the >> underlying potential non normal distribution) >> - The Z score (or value) >> - r >> - p value >> >> My questions are: >> >> - Are the above enough/correct values to report (some places even >> quote W and df) ? ?What else would you suggest? >> - How do I calculate the Z score and r for the above example? >> - How do I get each statistic from the pairwise.wilcox.test call? >> >> Many Thanks >> JP >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > From ligges at statistik.tu-dortmund.de Tue May 3 10:39:04 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 10:39:04 +0200 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: <4DBE6844.1090009@statistik.tu-dortmund.de> Message-ID: <4DBFBF28.2040004@statistik.tu-dortmund.de> On 03.05.2011 10:33, JP wrote: > Thanks Uwe, > > How do I calculate the Z score and r value - please (once I have the p values)? Actually you calculate the p value from the statistics rathger than vice versa. And pairwise.wilcox.test uses wilcox.test to calculate the separate tests and adjusts the p values for multiple testing later on. That's why I said you can manually run wilcox.test to get the separate statistics. Uwe Ligges > > Many Thanks > JP > > > > 2011/5/2 Uwe Ligges: >> To get the statsitics, you will have to run each wilcox.test manually. the >> pairwise... version just extracts the p-values and adjusts them. >> >> Uwe Ligges >> >> >> On 28.04.2011 15:18, JP wrote: >>> >>> Hi there, >>> >>> I am trying to do multiple pairwise Wilcoxon signed rank tests in a >>> manner similar to: >>> >>> a<- c(runif(1000, min=1,max=50), rnorm(1000, 50), rnorm(1000, 49.9, >>> 0.5), rgeom(1000, 0.5)) >>> b<- c(rep("group_a", 1000), rep("group_b", 1000), rep("group_c", >>> 1000), rep("group_d", 1000)) >>> pairwise.wilcox.test(a, b, alternative="two.sided", >>> p.adj="bonferroni", exact=F, paired=T) >>> >>> This gives me the following output: >>> >>> group_a group_b group_c >>> group_b<2e-16 - - >>> group_c<2e-16 0.25 - >>> group_d<2e-16<2e-16<2e-16 >>> >>> (which is kind of expected since group_b and group_c have similar >>> distributions) >>> >>> I have found that when doing a wilcoxon signed ranked test you should >>> report: >>> >>> - The median value (and not the mean or sd, presumably because of the >>> underlying potential non normal distribution) >>> - The Z score (or value) >>> - r >>> - p value >>> >>> My questions are: >>> >>> - Are the above enough/correct values to report (some places even >>> quote W and df) ? What else would you suggest? >>> - How do I calculate the Z score and r for the above example? >>> - How do I get each statistic from the pairwise.wilcox.test call? >>> >>> Many Thanks >>> JP >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> From jim at bitwrit.com.au Tue May 3 11:02:27 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Tue, 03 May 2011 19:02:27 +1000 Subject: [R] Help with coloring segments on a plot In-Reply-To: References: Message-ID: <4DBFC4A3.7020902@bitwrit.com.au> On 05/03/2011 04:26 AM, Paul Davison wrote: > Hi. I need a very short piece of help regarding colouring segments plotted > on a graph. > > When I am plotting segments for the graph, I am using "red" and "darkgreen > for the values "1" and "2" respectively. Heres the relevant line of code in > R: > Hi Paul, Try this simple bit of code and see if it contains the elements you need to solve the problem. # grab a bunch of easily distinguishable colors linecolors<-col2rgb(c(1:6,"#8888dd","brown","orange","pink")) # make an empty plot plot(0:10,type="n") # plot segments with the colors segments(1:10,0:9,2:11,1:10,col=linecolors) # stick some points on to better see the colors points(1:10,0:9,pch=19,col=linecolors) Jim From k.jewell at campden.co.uk Tue May 3 11:02:15 2011 From: k.jewell at campden.co.uk (Keith Jewell) Date: Tue, 3 May 2011 10:02:15 +0100 Subject: [R] QQ plot for normality testing References: Message-ID: I have found this web page useful http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/qqplot.html Your mileage may vary. Keith J "Matevz Pavlic" wrote in message news:AD5CA6183570B54F92AA45CE2619F9B90120801E at gi-zrmk.si... > Hi all, > > > > I am trying to test wheater the distribution of my samples is normal with > QQ plot. > > > > I have a values of water content in clays in around few hundred samples. > Is the code : > > > > qqnorm(w) #w being water content > > qqline(w) > > > > > > sufficient? > > > > How do I know when I get the plots which distribution is normal and which > is not? > > > > Thanks, m > > > [[alternative HTML version deleted]] > From fomcl at yahoo.com Tue May 3 11:10:31 2011 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Tue, 3 May 2011 02:10:31 -0700 (PDT) Subject: [R] Rodbc quesion: how to reliably determine the data type? In-Reply-To: References: <934521.97792.qm@web110705.mail.gq1.yahoo.com> Message-ID: <428124.4294.qm@web110708.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Tue May 3 12:55:19 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 12:55:19 +0200 Subject: [R] problem with Sweave and pdflatex In-Reply-To: <000001cc096b$c913a400$5b3aec00$@lehmann62@freenet.de> References: <000301cc08c7$b4c2ef50$1e48cdf0$@lehmann62@freenet.de> <4DBED318.7070808@statistik.tu-dortmund.de> <000001cc096b$c913a400$5b3aec00$@lehmann62@freenet.de> Message-ID: <4DBFDF17.4060301@statistik.tu-dortmund.de> At least it works for most of us.... Uwe Ligges On 03.05.2011 10:26, Frank Lehmann wrote: > I set the path with no spaces and run as administrator, but the problem is > not fixed. I'm not quite shure, but I can't remember the problem bevore R > version 2.13. Could it be, that the new R version causes that problem? > > Frank Lehmann > > -----Urspr?ngliche Nachricht----- > Von: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] > Gesendet: Montag, 2. Mai 2011 17:52 > An: Frank Lehmann > Cc: r-help at r-project.org > Betreff: Re: [R] problem with Sweave and pdflatex > > Have you checked the permissions in the working directory? Is there a > blank in your path (LaTeX does not like spaces in the path). > > Uwe Ligges > > > On 02.05.2011 14:51, Frank Lehmann wrote: >> Hallo, >> >> >> >> when I plot figures with Sweave, I get the message "pdflatex: Permission >> denied". This problem only occurs while working on local system. When I > copy >> the *.rnw-File to my AFS drive, there is no problem at all. >> >> >> >> Here is a small example: >> >> >> >> \documentclass{scrartcl} >> >> \usepackage[OT1]{fontenc} >> >> \usepackage[latin1]{inputenc} >> >> \usepackage[ngerman]{babel} >> >> \usepackage[pdftex]{graphicx} >> >> \usepackage{Sweave} >> >> >> >> \begin{document} >> >> >> >> \setkeys{Gin}{width=\textwidth} >> >> \begin{figure}[htbp] >> >> <>= >> >> x<- 1:10 >> >> plot(x) >> >> @ >> >> \caption{Eine einfache Grafik} >> >> \end{figure} >> >> >> >> \end{document} >> >> >> >> Does anyone have an idea, how to solve that problem? Im working with > Windows >> XP. >> >> >> >> Thanks! >> >> >> >> Frank >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > From nuncio.m at gmail.com Tue May 3 13:16:17 2011 From: nuncio.m at gmail.com (nuncio m) Date: Tue, 3 May 2011 16:46:17 +0530 Subject: [R] removing columns Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Tue May 3 13:17:05 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 13:17:05 +0200 Subject: [R] adaptIntegrate - how to pass additional parameters to the integrand In-Reply-To: <1304397791609-3491701.post@n4.nabble.com> References: <1304397791609-3491701.post@n4.nabble.com> Message-ID: <4DBFE431.3070904@statistik.tu-dortmund.de> On 03.05.2011 06:43, HC wrote: > Hello, > > I am trying to use adaptIntegrate function but I need to pass on a few > additional parameters to the integrand. However, this function seems not to > have the flexibility of passing on such additional parameters. > > Am I missing something or this is a known limitation. Is there a good > alternative to such restrictions, if there at all are? Looks like you are talking about the cubature package rather than about base R. Frr the latter question: Please ask the package maintainer rather than the list. Ideally send him code to implement the requested feature and the maintainer will probably add your code. Not all package maintainers read R-help. For an ad hoc solution: Just use adaptIntegrate(function(x, argA=a, argB=b) f(x, argA=argA, argB=argB), ......) in order to set additional arguments for the function call. Uwe Ligges > Many thanks for your time. > HC > > > -- > View this message in context: http://r.789695.n4.nabble.com/adaptIntegrate-how-to-pass-additional-parameters-to-the-integrand-tp3491701p3491701.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue May 3 13:24:10 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 13:24:10 +0200 Subject: [R] removing columns In-Reply-To: References: Message-ID: <4DBFE5DA.701@statistik.tu-dortmund.de> On 03.05.2011 13:16, nuncio m wrote: > Hi list, > > I have a matrix with all elements of some columns are zeroes. Is it > possible to remove these columns: Xnew <- X[ , as.logical(colSums(X)), drop=FALSE] Uwe Ligges > and create a new matrix > nuncio > > > > From ligges at statistik.tu-dortmund.de Tue May 3 13:26:15 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 13:26:15 +0200 Subject: [R] install rdcomclient source In-Reply-To: References: Message-ID: <4DBFE657.8090509@statistik.tu-dortmund.de> On 02.05.2011 23:48, Richard Wang wrote: > Hi, > > I'd like to ask a installation question. I want to install a source code > through the following command, > R CMD INSTALL RDCOMClient This is intended to be used in the shell of your OS (assuming Windows given the package), not in R. From within R use install.packages("RDCOMClient", type="source") if you really want to install from source. Uwe Ligges > but get Error: unexpected symbol in "r cmd" > > Please let know if I miss anything. I my utils package loaded. > > Thanks, > Richard > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From biomathjdaily at gmail.com Tue May 3 13:59:01 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Tue, 3 May 2011 07:59:01 -0400 Subject: [R] Problems with Rterm 2.13.0 - but not RGui In-Reply-To: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F6B3@DJFEXMBX01.djf.agrsci.dk> References: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> <1C56F3EE22DF4F458FE02F58B96D4150555EA9F6B3@DJFEXMBX01.djf.agrsci.dk> Message-ID: Ah ok. I suppose the fix is to get the hard path (C:/Program Files/...) on the search path and remove the symlink from the search path. Does that work? 2011/5/3 Stefan McKinnon H?j-Edwards : > Yes, the message is pretty clear, but it has nothing to do with running as admin. > I have just tried to start a command line with admin privileges and the error still occurs. > Regarding Rgui, I started it by opening the shortcut. > > Now I've tracked down the problem a bit, and the problem appears to be connected to which folder R is called from. > And by sheer luck I've resolved the problem: > In all previous versions of Windows, on the Danish editions, the "C:\Program Files" directory was called "C:\Programmer". This appears to be the case in Windows 7, but "C:\Programmer" is a symbolic link (hard/soft?) to "C:\Program Files". And apparently, I've been calling R from "C:\Programmer" instead of "C:\Program Files" which gave the problem. > When/how/why I changed the PATH variable to the symbolic link is unclear, but a quick check reveals that the problem did not exist in R 2.12.1: > C:\Programmer\R\R-2.12.1\bin\i386\Rterm ?# No problem > C:\Programmer\R\R-2.13.0\bin\i386\Rterm ?# Problem > > I will submit a bug report on this. > > Kind regards, > Stefan McKinnon Edwards > > > -----Oprindelig meddelelse----- > Fra: Jonathan Daily [mailto:biomathjdaily at gmail.com] > Sendt: 2. maj 2011 16:59 > Til: Stefan McKinnon H?j-Edwards > Cc: r-help at r-project.org > Emne: Re: [R] Problems with Rterm 2.13.0 - but not RGui > > The message is pretty clear. Access denied means you don't have > permission to access the path. This also explains why the packages > fail to load - you don't have access to R's package library. It most > likely works on RGui because you are clicking it/running it as admin > (you did not specify how you ran RGui). > > 2011/5/2 Stefan McKinnon H?j-Edwards : >> Hi all, >> >> I have just installed R 2.13.0 and I am experiencing problems with the terminal, but not the with the GUI interface. >> I am Windows 7. >> >> When running "R" or "Rterm" from a commandline I receive the following: >> >> Warning message: >> In normalizePath(path.expand(path), winslash, mustWork) : >> ?path[3]="C:/Programmer/R/R-2.13.0/library": Adgang n?gtet >> >> R version 2.13.0 (2011-04-13) >> Copyright (C) 2011 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> Warning message: >> package "methods" in options("defaultPackages") was not found >> During startup - Warning messages: >> 1: package 'datasets' in options("defaultPackages") was not found >> 2: package 'utils' in options("defaultPackages") was not found >> 3: package 'grDevices' in options("defaultPackages") was not found >> 4: package 'graphics' in options("defaultPackages") was not found >> 5: package 'stats' in options("defaultPackages") was not found >> 6: package 'methods' in options("defaultPackages") was not found >> >> >> Notice: "C:/Programmer/" is the Danish equivalent of "C:/Program Files". >> The first error "Adgang n?gtet" is directly translated to "Access denied". >> >> Any suggestions as how to fix this? >> >> Kind regards, >> Stefan McKinnon Edwards >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > =============================================== > Jon Daily > Technician > =============================================== > #!/usr/bin/env outside > # It's great, trust me. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From Soren.Hojsgaard at agrsci.dk Tue May 3 14:04:29 2011 From: Soren.Hojsgaard at agrsci.dk (=?utf-8?B?U8O4cmVuIEjDuGpzZ2FhcmQ=?=) Date: Tue, 3 May 2011 14:04:29 +0200 Subject: [R] Problems with Rterm 2.13.0 - but not RGui In-Reply-To: References: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> <1C56F3EE22DF4F458FE02F58B96D4150555EA9F6B3@DJFEXMBX01.djf.agrsci.dk> Message-ID: <9F0721FDD4F12D4B95AD894274F388EC020C641D8645@DJFEXMBX01.djf.agrsci.dk> A safe way out of this mess is to install R somewhere else. For example, create a directory c:\Programs and install R there. Regards S?ren -----Oprindelig meddelelse----- Fra: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] P? vegne af Jonathan Daily Sendt: 3. maj 2011 13:59 Til: Stefan McKinnon H?j-Edwards Cc: r-help at r-project.org Emne: Re: [R] Problems with Rterm 2.13.0 - but not RGui Ah ok. I suppose the fix is to get the hard path (C:/Program Files/...) on the search path and remove the symlink from the search path. Does that work? 2011/5/3 Stefan McKinnon H?j-Edwards : > Yes, the message is pretty clear, but it has nothing to do with running as admin. > I have just tried to start a command line with admin privileges and the error still occurs. > Regarding Rgui, I started it by opening the shortcut. > > Now I've tracked down the problem a bit, and the problem appears to be connected to which folder R is called from. > And by sheer luck I've resolved the problem: > In all previous versions of Windows, on the Danish editions, the "C:\Program Files" directory was called "C:\Programmer". This appears to be the case in Windows 7, but "C:\Programmer" is a symbolic link (hard/soft?) to "C:\Program Files". And apparently, I've been calling R from "C:\Programmer" instead of "C:\Program Files" which gave the problem. > When/how/why I changed the PATH variable to the symbolic link is unclear, but a quick check reveals that the problem did not exist in R 2.12.1: > C:\Programmer\R\R-2.12.1\bin\i386\Rterm ?# No problem > C:\Programmer\R\R-2.13.0\bin\i386\Rterm ?# Problem > > I will submit a bug report on this. > > Kind regards, > Stefan McKinnon Edwards > > > -----Oprindelig meddelelse----- > Fra: Jonathan Daily [mailto:biomathjdaily at gmail.com] > Sendt: 2. maj 2011 16:59 > Til: Stefan McKinnon H?j-Edwards > Cc: r-help at r-project.org > Emne: Re: [R] Problems with Rterm 2.13.0 - but not RGui > > The message is pretty clear. Access denied means you don't have > permission to access the path. This also explains why the packages > fail to load - you don't have access to R's package library. It most > likely works on RGui because you are clicking it/running it as admin > (you did not specify how you ran RGui). > > 2011/5/2 Stefan McKinnon H?j-Edwards : >> Hi all, >> >> I have just installed R 2.13.0 and I am experiencing problems with the terminal, but not the with the GUI interface. >> I am Windows 7. >> >> When running "R" or "Rterm" from a commandline I receive the following: >> >> Warning message: >> In normalizePath(path.expand(path), winslash, mustWork) : >> ?path[3]="C:/Programmer/R/R-2.13.0/library": Adgang n?gtet >> >> R version 2.13.0 (2011-04-13) >> Copyright (C) 2011 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> Warning message: >> package "methods" in options("defaultPackages") was not found >> During startup - Warning messages: >> 1: package 'datasets' in options("defaultPackages") was not found >> 2: package 'utils' in options("defaultPackages") was not found >> 3: package 'grDevices' in options("defaultPackages") was not found >> 4: package 'graphics' in options("defaultPackages") was not found >> 5: package 'stats' in options("defaultPackages") was not found >> 6: package 'methods' in options("defaultPackages") was not found >> >> >> Notice: "C:/Programmer/" is the Danish equivalent of "C:/Program Files". >> The first error "Adgang n?gtet" is directly translated to "Access denied". >> >> Any suggestions as how to fix this? >> >> Kind regards, >> Stefan McKinnon Edwards >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > =============================================== > Jon Daily > Technician > =============================================== > #!/usr/bin/env outside > # It's great, trust me. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From richcmwang at gmail.com Tue May 3 14:09:05 2011 From: richcmwang at gmail.com (Richard Wang) Date: Tue, 3 May 2011 13:09:05 +0100 Subject: [R] install rdcomclient source In-Reply-To: <4DBFE657.8090509@statistik.tu-dortmund.de> References: <4DBFE657.8090509@statistik.tu-dortmund.de> Message-ID: Thanks. One more question. If I use install.packsges, do I need to install Rtool or utils package is sufficient? Thanks, Richard On 3 May 2011, at 12:26, Uwe Ligges wrote: > > > On 02.05.2011 23:48, Richard Wang wrote: >> Hi, >> >> I'd like to ask a installation question. I want to install a source code >> through the following command, >> R CMD INSTALL RDCOMClient > > > This is intended to be used in the shell of your OS (assuming Windows given the package), not in R. > > From within R use install.packages("RDCOMClient", type="source") if you really want to install from source. > > Uwe Ligges > >> but get Error: unexpected symbol in "r cmd" >> >> Please let know if I miss anything. I my utils package loaded. >> >> Thanks, >> Richard >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue May 3 14:19:30 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 14:19:30 +0200 Subject: [R] install rdcomclient source In-Reply-To: References: <4DBFE657.8090509@statistik.tu-dortmund.de> Message-ID: <4DBFF2D2.1000906@statistik.tu-dortmund.de> On 03.05.2011 14:09, Richard Wang wrote: > Thanks. One more question. If I use install.packsges, do I need to install Rtool or utils package is sufficient? It depends on how demanding the package is. For the one you mentioned, you will need the Rtools, since C/C++ sources are to be compiled. And it won't work out of the box, since some manual tweaks are required - at least the last time I tried. Do you know that the package is available from CRAN extras if form of a Windows binary? install.packages() without the type="source" argument should work right away. Uwe Ligges > Thanks, > Richard > > > On 3 May 2011, at 12:26, Uwe Ligges wrote: > >> >> >> On 02.05.2011 23:48, Richard Wang wrote: >>> Hi, >>> >>> I'd like to ask a installation question. I want to install a source code >>> through the following command, >>> R CMD INSTALL RDCOMClient >> >> >> This is intended to be used in the shell of your OS (assuming Windows given the package), not in R. >> >> From within R use install.packages("RDCOMClient", type="source") if you really want to install from source. >> >> Uwe Ligges >> >>> but get Error: unexpected symbol in "r cmd" >>> >>> Please let know if I miss anything. I my utils package loaded. >>> >>> Thanks, >>> Richard >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. From friendly at yorku.ca Tue May 3 14:52:21 2011 From: friendly at yorku.ca (Michael Friendly) Date: Tue, 3 May 2011 08:52:21 -0400 Subject: [R] latex, eps graphics and transparent colors / sam2p In-Reply-To: References: <4DA5DDA9.4060601@yorku.ca> Message-ID: <4DBFFA85.6050800@yorku.ca> On 04/13/2011 05:06 PM, Ben Bolker wrote: > Thomas Lumley uw.edu> writes: > >> >> On Thu, Apr 14, 2011 at 5:30 AM, >> Michael Friendly yorku.ca> wrote: >>> I have a diagram to be included in latex, where all my figures are .eps >>> graphics (so pdflatex is not an option) >> >> You could use the pdf() device and then use pdf2ps to convert to PostScript. > > Clever. > > [snip] > >> There's now an adjustcolor() function in base R to do this. >> > > That makes my solution more or less obsolete. > This is a follow-up to this thread, for which I thank everyone who replied. I could have fiddled with adjustcolor() to avoid using transparent colors, but instead did some tests on generating .png or .pdf files with transparent colors and then converting to .eps. I tried the following, in various combinations [running on ubuntu 10.02 linux] input file: foo.{pdf,png} converters: pdf2ps, pdftopdf, convert (ImageMagick) All of these gave really *huge* output files, by a factor of 10-20 times the original. Finally, I remembered (the badly named) sam2p utility, http://pts.szit.bme.hu/sam2p/ http://code.google.com/p/sam2p/ I'm not sure what magic it uses for compression, but the results are quite impressive compared with the competition. Bottom line: the combination of .pdf + sam2p seems to work best in my tests, below. The result is even smaller than the input file, and I can't tell the difference in the onscreen display. % ls -l foo.* -rw-r--r-- 1 friendly staff 11794 2011-04-29 08:31 foo.pdf -rw-r--r-- 1 friendly staff 20775 2011-04-29 08:32 foo.png euclid: /tmp % pdf2ps foo.pdf foo-pdf2ps.eps euclid: /tmp % pdftops foo.pdf foo-pdftops.eps euclid: /tmp % convert foo.pdf foo-convert-pdf.eps euclid: /tmp % convert foo.png foo-convert-png.eps euclid: /tmp % sam2p foo.png foo-sam2p-png.eps This is sam2p v0.47-1. Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA. Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb. sam2p: Notice: PNM: loaded alpha, but no transparent pixels sam2p: Notice: job: read InputFile: foo.png sam2p: Notice: writeTTT: using template: l23 sam2p: Notice: applyProfile: applied OutputRule #37 sam2p: Notice: job: written OutputFile: foo-sam2p-png.eps Success. sam2p foo.pdf foo-sam2p-pdf.eps This is sam2p v0.47-1. Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA. Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb. gs_cmd=(gs -r72 -q -dTextAlphaBits=4 -dGraphicsAlphaBits=4 -dLastPage=1 -sDEVICE=pnmraw -dDELAYSAFER -dBATCH -dNOPAUSE -sOutputFile=%D -- %S) sam2p: Notice: job: read InputFile: foo.pdf sam2p: Notice: writeTTT: using template: l23 sam2p: Notice: applyProfile: applied OutputRule #27 sam2p: Notice: job: written OutputFile: foo-sam2p-pdf.eps Success. euclid: /tmp % ls -l foo* -rw-r--r-- 1 friendly staff 1139852 2011-05-03 08:41 foo-convert-pdf.eps -rw-r--r-- 1 friendly staff 2924149 2011-05-03 08:41 foo-convert-png.eps -rw-r--r-- 1 friendly staff 11794 2011-04-29 08:31 foo.pdf -rw-r--r-- 1 friendly staff 1687255 2011-05-03 08:41 foo-pdf2ps.eps -rw-r--r-- 1 friendly staff 2775130 2011-05-03 08:41 foo-pdftops.eps -rw-r--r-- 1 friendly staff 20775 2011-04-29 08:32 foo.png -rw-r--r-- 1 friendly staff 8701 2011-05-03 08:44 foo-sam2p-pdf.eps -rw-r--r-- 1 friendly staff 35926 2011-05-03 08:42 foo-sam2p-png.eps euclid: /tmp % From richcmwang at gmail.com Tue May 3 15:05:20 2011 From: richcmwang at gmail.com (Richard Wang) Date: Tue, 3 May 2011 14:05:20 +0100 Subject: [R] install rdcomclient source In-Reply-To: <4DBFF2D2.1000906@statistik.tu-dortmund.de> References: <4DBFE657.8090509@statistik.tu-dortmund.de> <4DBFF2D2.1000906@statistik.tu-dortmund.de> Message-ID: <1AD2AA98-CB8D-46B6-B24B-AC8B08E9DCED@gmail.com> Thanks. I didn't know that. I just found it in Brian Ripley's page. Is this the cran extras? Thanks Richard On 3 May 2011, at 13:19, Uwe Ligges wrote: > > > On 03.05.2011 14:09, Richard Wang wrote: >> Thanks. One more question. If I use install.packsges, do I need to install Rtool or utils package is sufficient? > > > It depends on how demanding the package is. For the one you mentioned, you will need the Rtools, since C/C++ sources are to be compiled. And it won't work out of the box, since some manual tweaks are required - at least the last time I tried. > > Do you know that the package is available from CRAN extras if form of a Windows binary? install.packages() without the type="source" argument should work right away. > > Uwe Ligges > > >> Thanks, >> Richard >> >> >> On 3 May 2011, at 12:26, Uwe Ligges wrote: >> >>> >>> >>> On 02.05.2011 23:48, Richard Wang wrote: >>>> Hi, >>>> >>>> I'd like to ask a installation question. I want to install a source code >>>> through the following command, >>>> R CMD INSTALL RDCOMClient >>> >>> >>> This is intended to be used in the shell of your OS (assuming Windows given the package), not in R. >>> >>> From within R use install.packages("RDCOMClient", type="source") if you really want to install from source. >>> >>> Uwe Ligges >>> >>>> but get Error: unexpected symbol in "r cmd" >>>> >>>> Please let know if I miss anything. I my utils package loaded. >>>> >>>> Thanks, >>>> Richard >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. From bbolker at gmail.com Tue May 3 15:43:58 2011 From: bbolker at gmail.com (Ben Bolker) Date: Tue, 3 May 2011 09:43:58 -0400 Subject: [R] Categorical bubble plot In-Reply-To: References: <4DA845DD.6080008@gmail.com> Message-ID: <4DC0069E.4080604@gmail.com> On 11-05-03 09:23 AM, Jurgens de Bruin wrote: > So I have been playing with bubble plot and I was able to create bubble > plot with relatively simple data. I am no having problems when I > increase the complexity of my data. Below is a example of my data: > > phytochemical MainClass FitValues > Name A 0.5 > A 0.7 > A 0.8 > B 0.1 > B 0.4 > B 0.6 > > Phytochemical in on the y-axis and MainClass on the x-axis and FitValues > is the color and size of the bubble so each x-y has more than one > FitValue. Will the following still Work? > > bubble = ggplot2.ggplot(dataf) + \ > ggplot2.aes_string(x='mainclass', y='phytochemcial', > col='fitvalue',size='fitvalue') + \ > ggplot2.geom_point() > > This is done in rpy2 so it may not look like R. > I would expect so, although having the bubbles overlaid on each other may make it hard to see the results clearly. Did you try it and see what happens? In general it is better if you can provide us with full/reproducible examples so we don't have to speculate. You might want to set the points partly transparent (e.g. ggplot2.geom_point(alpha=0.5)). However, at this point I would also consider changing the way you are plotting the data. As I may have said previously (or may have been tempted but kept my mouth shut), colour and size are at the bottom of the "Cleveland hierarchy" -- that is, it's very hard to to reliably interpret quantitative information on the basis of size and colour. A dot chart of some form would probably convey the information more accurately. Ben Bolker > Thanks!!! > > On 21 April 2011 09:49, Jurgens de Bruin > wrote: > > Thanks for all the help!!! > > > On 16 April 2011 08:32, Tal Galili > wrote: > > Hi Jurgens, > > In the following post I show how to use balloonplot from qplots > to do more or less what you ask: > http://www.r-statistics.com/2010/02/nutritional-supplements-efficacy-score-graphing-plots-of-current-studies-results-using-r/ > > p.s: the code has a slight modification to it, so to handle > overlapping texts. Dear Jim, I'd be happy if some of it might > be considered into the official release. > > Cheers, > Tal > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com > | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | > www.biostatistics.co.il > (Hebrew) | www.r-statistics.com > (English) > ---------------------------------------------------------------------------------------------- > > > > > On Fri, Apr 15, 2011 at 4:19 PM, Ben Bolker > wrote: > > On 04/15/2011 01:13 AM, Jurgens de Bruin wrote: > > Thanks for the reply... > > > > with reproducible I am believe you require a dataset? > > yes -- but you can make one up if you like. e.g. > > > dd <- expand.grid(drugclass=LETTERS[1:5], > plant=c("cactus","sequoia","mistletoe")) > set.seed(101) > dd$fitvalue <- runif(nrow(dd)) > > library(ggplot2) > ggplot(dd,aes(x=drugclass,y=plant,colour=fitvalue,size=fitvalue))+ > geom_point() > > By the way, I think you could represent your data much more > clearly this way: the "Cleveland hierarchy" says that it's > easier > to assess quantitative values plotted along a common scale > than via > size or colour ... > > ggplot(dd,aes(x=drugclass,y=fitvalue,colour=plant))+ > geom_point()+geom_line(aes(group=plant)) > > > > > The size of the bubbles will be related to the fitvalues. > > > > > > > > On 14 April 2011 17:57, Ben Bolker > > >> wrote: > > > > Jurgens de Bruin gmail.com > > writes: > > > > > > > > Hi, > > > > > > I do not have much R experience just the basics, so > please excuse > > > any obvious questions. > > > > > > I would like to create bubble plot that have > Categorical data on > > the x and y > > > axis and then the diameter if the bubble the value > related to x and y. > > > Attached to the email is a pic of what I would like > to do. > > > > > > > A reproducible example would be great. > > > > something along the lines of > > > > library(ggplot2) > > > ggplot(mydata,aes(x=drugclass,y=plant,colour=fitvalue,size=?))+geom_point() > > > > it's not clear from your description what determines > the size. > > From a labeling point of view, switching x and y > might be useful. > > > > ______________________________________________ > > R-help at r-project.org > > > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > > > > > -- > > Regards/Groete/Mit freundlichen > Gr??en/recuerdos/meilleures salutations/ > > distinti saluti/siong/du? y?/?????? > > > > Jurgens de Bruin > > ______________________________________________ > > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible > code. > > > > > > -- > Regards/Groete/Mit freundlichen Gr??en/recuerdos/meilleures salutations/ > distinti saluti/siong/du? y?/?????? > > Jurgens de Bruin > > > > > -- > Regards/Groete/Mit freundlichen Gr??en/recuerdos/meilleures salutations/ > distinti saluti/siong/du? y?/?????? > > Jurgens de Bruin From dwinsemius at comcast.net Tue May 3 15:50:02 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 3 May 2011 06:50:02 -0700 Subject: [R] Rodbc quesion: how to reliably determine the data type? In-Reply-To: <428124.4294.qm@web110708.mail.gq1.yahoo.com> References: <934521.97792.qm@web110705.mail.gq1.yahoo.com> <428124.4294.qm@web110708.mail.gq1.yahoo.com> Message-ID: <381D9B24-9A67-4F29-9557-0DC9E933FD4B@comcast.net> On May 3, 2011, at 2:10 AM, Albert-Jan Roskam wrote: > Hi Jeff, > > Ah, thanks a lot! Yes, meanwhile I also switched to csv. This still > requires > knowledge about the regional settings (Sys.getlocale), but it's a > lot more > transparent. > > > I'm quite new to R and I must say that stuff like this is eating up > a LOT of my > time. All those invisible data type conversions are driving me nuts. > StringsAsFactors=F should be the default, for instance. > You can set that (if spelled correctly) with options in your .Rprofile, and some knowledgeable people have made that choice. -- David. > Cheers!! > Albert-Jan > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > All right, but apart from the sanitation, the medicine, education, > wine, public > order, irrigation, roads, a fresh water system, and public health, > what have the > Romans ever done for us? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > David Winsemius, MD Heritage Laboratories West Hartford, CT From dwinsemius at comcast.net Tue May 3 15:55:19 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 3 May 2011 06:55:19 -0700 Subject: [R] removing columns In-Reply-To: <4DBFE5DA.701@statistik.tu-dortmund.de> References: <4DBFE5DA.701@statistik.tu-dortmund.de> Message-ID: <106B366B-E59E-4126-957F-700C70B5159F@comcast.net> On May 3, 2011, at 4:24 AM, Uwe Ligges wrote: > > > On 03.05.2011 13:16, nuncio m wrote: >> Hi list, >> >> I have a matrix with all elements of some columns are zeroes. Is it >> possible to remove these columns: > > > Xnew <- X[ , as.logical(colSums(X)), drop=FALSE] A counter-example: X <- matrix(c(1,1,1,-1,0,1,1,1,1),3) A fix: > Xnew <- X[ , as.logical(colSums(abs(X))), drop=FALSE] > Xnew [,1] [,2] [,3] [1,] 1 -1 1 [2,] 1 0 1 [3,] 1 1 1 > > Uwe Ligges > > >> and create a new matrix >> nuncio David Winsemius, MD Heritage Laboratories West Hartford, CT From projectbasu at gmail.com Tue May 3 15:12:46 2011 From: projectbasu at gmail.com (swaraj basu) Date: Tue, 3 May 2011 15:12:46 +0200 Subject: [R] Axis trouble Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ckalexa2 at ncsu.edu Tue May 3 15:45:36 2011 From: ckalexa2 at ncsu.edu (Clemontina Alexander) Date: Tue, 3 May 2011 09:45:36 -0400 Subject: [R] Lasso with Categorical Variables In-Reply-To: <3132677569885079094@unknownmsgid> References: <8E59905B-887C-4452-9526-6A73C0EAD634@comcast.net> <3132677569885079094@unknownmsgid> Message-ID: Thanks for all your help and I apologize for not being clear in the beginning. I will try the "group lasso" packages. From the paper, it seems like that is what I want to do. Thanks again! On Tue, May 3, 2011 at 2:40 AM, Nick Sabbe wrote: > For performance reasons, I advise on using the following function instead of > model.matrix: > > factorsToDummyVariables<-function(dfr, betweenColAndLevel="") > { > ? ? ? ?nc<-dim(dfr)[2] > ? ? ? ?firstRow<-dfr[1,] > ? ? ? ?coln<-colnames(dfr) > ? ? ? ?retval<-do.call(cbind, lapply(seq(nc), function(ci){ > ? ? ? ? ? ? ? ? ? ? ? ?if(is.factor(firstRow[,ci])) > ? ? ? ? ? ? ? ? ? ? ? ?{ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?lvls<-levels(firstRow[,ci])[-1] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stretchedcols<-sapply(lvls, function(lvl){ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rv<-dfr[,ci]==lvl > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?mode(rv)<-"integer" > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return(rv) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?}) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if(!is.matrix(stretchedcols)) > stretchedcols<-matrix(stretchedcols, nrow=1) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?colnames(stretchedcols)<-paste(coln[ci], > lvls, sep=betweenColAndLevel) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return(stretchedcols) > ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ? ? ? ? ?else > ? ? ? ? ? ? ? ? ? ? ? ?{ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?curcol<-matrix(dfr[,ci], ncol=1) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?colnames(curcol)<-coln[ci] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return(curcol) > ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?})) > ? ? ? ?rownames(retval)<-rownames(dfr) > ? ? ? ?return(retval) > } > > > Just for comparison: here is my old version of the same function, using > model.matrix: > > factorsToDummyVariables.old<-function(dfrPredictors, > form=paste("~",paste(colnames(dfrPredictors), collapse="+"), sep="")) > { > ? ? ? ?#note: this function seems to operate quite slowly! > ? ? ? ?#Because it is used often, it may be worth improving its speed > ? ? ? ?dfrTmp<-model.frame(dfrPredictors, na.action=na.pass) > ? ? ? ?frm<-as.formula(form) > ? ? ? ?mm<-model.matrix(frm, data=dfrTmp) > ? ? ? ?retval<-as.matrix(mm)[,-1] > > ? ? ? ?return(retval) > } > > In a testcase with a reasonably big dataset, I compared the speeds: > > #system.time(tmp.fd.convds.full.man<-manualFactorsToDummyVariables(ds)) > ## ? user ?system elapsed > ## ? 9.44 ? ?0.00 ? ?9.48 > #system.time(tmp.fd.convds.full<-factorsToDummyVariables.old(ds)) > ## ? user ?system elapsed > ## ?15.49 ? ?0.00 ? 15.64 > #system.time(invisible(factorsToDummyVariables (ds[10,]))) > ## ? user ?system elapsed > ## ? 0.36 ? ?0.00 ? ?0.36 > #system.time(invisible(factorsToDummyVariables.old (ds[10,]))) > ## ? user ?system elapsed > ## ? 2.18 ? ?0.00 ? ?2.20 > #system.time(invisible(factorsToDummyVariables (ds[20:30,]))) > ## ? user ?system elapsed > ## ? 0.34 ? ?0.00 ? ?0.38 > #system.time(invisible(factorsToDummyVariables.old (ds[20:30,]))) > ## ? user ?system elapsed > ## ? 2.11 ? ?0.00 ? ?2.15 > > If you have to do this quite often, the difference surely adds up... > More improvements may be possible. > This function only works if you don't include interactions, though. > > > Nick Sabbe > -- > ping: nick.sabbe at ugent.be > link: http://biomath.ugent.be > wink: A1.056, Coupure Links 653, 9000 Gent > ring: 09/264.59.36 > > -- Do Not Disapprove > > > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of David Winsemius > Sent: maandag 2 mei 2011 20:48 > To: Steve Lianoglou > Cc: r-help at r-project.org > Subject: Re: [R] Lasso with Categorical Variables > > > On May 2, 2011, at 10:51 AM, Steve Lianoglou wrote: > >> Hi, >> >> On Mon, May 2, 2011 at 12:45 PM, Clemontina Alexander > > wrote: >>> Hi! This is my first time posting. I've read the general rules and >>> guidelines, but please bear with me if I make some fatal error in >>> posting. Anyway, I have a continuous response and 29 predictors made >>> up of continuous variables and nominal and ordinal categorical >>> variables. I'd like to do lasso on these, but I get an error. The way >>> I am using "lars" doesn't allow for the factors. Is there a special >>> option or some other method in order to do lasso with cat. variables? >>> >>> Here is and example (considering ordinal variables as just nominal): >>> >>> set.seed(1) >>> Y <- rnorm(10,0,1) >>> X1 <- factor(sample(x=LETTERS[1:4], size=10, replace = TRUE)) >>> X2 <- factor(sample(x=LETTERS[5:10], size=10, replace = TRUE)) >>> X3 <- sample(x=30:55, size=10, replace=TRUE) ?# think age >>> X4 <- rchisq(10, df=4, ncp=0) >>> X <- data.frame(X1,X2,X3,X4) >>> >>>> str(X) >>> 'data.frame': ? 10 obs. of ?4 variables: >>> ?$ X1: Factor w/ 4 levels "A","B","C","D": 4 1 3 1 2 2 1 2 4 2 >>> ?$ X2: Factor w/ 5 levels "E","F","G","H",..: 3 4 3 2 5 5 5 1 5 3 >>> ?$ X3: int ?51 46 50 44 43 50 30 42 49 48 >>> ?$ X4: num ?2.86 1.55 1.94 2.45 2.75 ... >>> >>> >>> I'd like to do: >>> obj <- lars(x=X, y=Y, type = "lasso") >>> >>> Instead, what I have been doing is converting all data to continuous >>> but I think this is really bad! >> >> Yeah, it is. >> >> Check out the "Categorical Predictor Variables" section here for a way >> to handle such predictor vars: >> http://www.psychstat.missouristate.edu/multibook/mlt08m.html > > Steve's citation is somewhat helpful, but not sufficient to take the > next steps. You can find details regarding the mechanics of typical > linear regression in R on the ?lm page where you find that the factor > variables are typically handled by model.matrix. See below: > > ?> model.matrix(~X1 + X2 + X3 + X4, X) > ? ?(Intercept) X1B X1C X1D X2F X2G X2H X2I X3 ? ? ? ?X4 > 1 ? ? ? ? ? ?1 ? 0 ? 0 ? 1 ? 0 ? 1 ? 0 ? 0 51 2.8640884 > 2 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 ? 0 46 1.5462243 > 3 ? ? ? ? ? ?1 ? 0 ? 1 ? 0 ? 0 ? 1 ? 0 ? 0 50 1.9430901 > 4 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 1 ? 0 ? 0 ? 0 44 2.4504180 > 5 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 43 2.7535052 > 6 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 50 1.6200326 > 7 ? ? ? ? ? ?1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 1 30 0.5750533 > 8 ? ? ? ? ? ?1 ? 1 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 42 5.9224777 > 9 ? ? ? ? ? ?1 ? 0 ? 0 ? 1 ? 0 ? 0 ? 0 ? 1 49 2.0401528 > 10 ? ? ? ? ? 1 ? 1 ? 0 ? 0 ? 0 ? 1 ? 0 ? 0 48 6.2995288 > attr(,"assign") > ?[1] 0 1 1 1 2 2 2 2 3 4 > attr(,"contrasts") > attr(,"contrasts")$X1 > [1] "contr.treatment" > > attr(,"contrasts")$X2 > [1] "contr.treatment" > > The numeric variables are passed through, while the dummy variables > for factor columns are constructed (as treatment contrasts) and the > whole thing it returned in a neat package. > > -- > David. >> >> HTH, >> -steve >> > -- > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From kparamas at asu.edu Tue May 3 09:42:55 2011 From: kparamas at asu.edu (kparamas) Date: Tue, 3 May 2011 00:42:55 -0700 (PDT) Subject: [R] Watts Strogatz game Message-ID: <1304408575725-3491922.post@n4.nabble.com> Hi, I have a erdos-renyi game with 6000 nodes and probability 0.003. g1 = erdos.renyi.game(6000, 0.003) How to create a Watts Strogatz game with the same probability. g1 = watts.strogatz.game(1, 6000, ?, ?) What should be the third and fourth parameter to this argument. -- View this message in context: http://r.789695.n4.nabble.com/Watts-Strogatz-game-tp3491922p3491922.html Sent from the R help mailing list archive at Nabble.com. From frank.lehmann62 at freenet.de Tue May 3 10:26:21 2011 From: frank.lehmann62 at freenet.de (Frank Lehmann) Date: Tue, 3 May 2011 10:26:21 +0200 Subject: [R] problem with Sweave and pdflatex In-Reply-To: <4DBED318.7070808@statistik.tu-dortmund.de> References: <000301cc08c7$b4c2ef50$1e48cdf0$@lehmann62@freenet.de> <4DBED318.7070808@statistik.tu-dortmund.de> Message-ID: <000001cc096b$c913a400$5b3aec00$@lehmann62@freenet.de> I set the path with no spaces and run as administrator, but the problem is not fixed. I'm not quite shure, but I can't remember the problem bevore R version 2.13. Could it be, that the new R version causes that problem? Frank Lehmann -----Urspr?ngliche Nachricht----- Von: Uwe Ligges [mailto:ligges at statistik.tu-dortmund.de] Gesendet: Montag, 2. Mai 2011 17:52 An: Frank Lehmann Cc: r-help at r-project.org Betreff: Re: [R] problem with Sweave and pdflatex Have you checked the permissions in the working directory? Is there a blank in your path (LaTeX does not like spaces in the path). Uwe Ligges On 02.05.2011 14:51, Frank Lehmann wrote: > Hallo, > > > > when I plot figures with Sweave, I get the message "pdflatex: Permission > denied". This problem only occurs while working on local system. When I copy > the *.rnw-File to my AFS drive, there is no problem at all. > > > > Here is a small example: > > > > \documentclass{scrartcl} > > \usepackage[OT1]{fontenc} > > \usepackage[latin1]{inputenc} > > \usepackage[ngerman]{babel} > > \usepackage[pdftex]{graphicx} > > \usepackage{Sweave} > > > > \begin{document} > > > > \setkeys{Gin}{width=\textwidth} > > \begin{figure}[htbp] > > <>= > > x<- 1:10 > > plot(x) > > @ > > \caption{Eine einfache Grafik} > > \end{figure} > > > > \end{document} > > > > Does anyone have an idea, how to solve that problem? Im working with Windows > XP. > > > > Thanks! > > > > Frank > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From usha.nair at tcs.com Tue May 3 11:53:19 2011 From: usha.nair at tcs.com (Usha) Date: Tue, 3 May 2011 02:53:19 -0700 (PDT) Subject: [R] fitting distributions using fitdistr (MASS) Message-ID: <1304416399408-3492103.post@n4.nabble.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From crosspide at hotmail.com Tue May 3 12:29:37 2011 From: crosspide at hotmail.com (agent dunham) Date: Tue, 3 May 2011 03:29:37 -0700 (PDT) Subject: [R] delete excel id automatically generated Message-ID: <1304418577180-3492147.post@n4.nabble.com> Dear community, I uploaded an excel with read.xls. My xls file actually have a column which is an id, ("plot" is the id) : plot height area 34 7.6 5.4 85 3.2 4.1 89 5.4 8.4 121 6.7 6.2 ... 1325 2.1 1.5 However R uses another id, this way: r id plot height area 1 34 7.6 5.4 2 85 3.2 4.1 3 89 5.4 8.4 4 121 6.7 6.2 ... 314 1325 2.1 1.5 I'd like that R used "plot" id because I delete some rows while studying regression, and R seems to be using the first id 1,2,3,4,...,314. Sometimes it's a mess to understand what R means in the plots when, for instance, states that data 200 is influential Thanks in advance, user at host.com -- View this message in context: http://r.789695.n4.nabble.com/delete-excel-id-automatically-generated-tp3492147p3492147.html Sent from the R help mailing list archive at Nabble.com. From shekhar2581 at gmail.com Tue May 3 13:44:03 2011 From: shekhar2581 at gmail.com (Shekhar) Date: Tue, 3 May 2011 04:44:03 -0700 (PDT) Subject: [R] How to fit a random data into Beta distribution? Message-ID: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com> Hi, I have some random data and i want to find out the parameters of Beta distribution ( a and b) such that this data approximately fits into this distribution. I have tried by plot the histograms and graph, but it requires lot of tuning and i am unable to do that. can anyone tell me how to do it programmitically in R? Regards, Som Shekhar From Stefan.Hoj-Edwards at agrsci.dk Tue May 3 14:12:22 2011 From: Stefan.Hoj-Edwards at agrsci.dk (=?utf-8?B?U3RlZmFuIE1jS2lubm9uIEjDuGotRWR3YXJkcw==?=) Date: Tue, 3 May 2011 14:12:22 +0200 Subject: [R] Problems with Rterm 2.13.0 - but not RGui In-Reply-To: References: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F69B@DJFEXMBX01.djf.agrsci.dk> <1C56F3EE22DF4F458FE02F58B96D4150555EA9F6B3@DJFEXMBX01.djf.agrsci.dk> Message-ID: <1C56F3EE22DF4F458FE02F58B96D4150555EA9F6EF@DJFEXMBX01.djf.agrsci.dk> I've changed to the hard path in the PATH environment variable in Windows and it works. If you are talking about the search path in R, I do not have a clue on how to test it. Regards, Stefan -----Oprindelig meddelelse----- Fra: Jonathan Daily [mailto:biomathjdaily at gmail.com] Sendt: 3. maj 2011 13:59 Til: Stefan McKinnon H?j-Edwards Cc: r-help at r-project.org Emne: Re: [R] Problems with Rterm 2.13.0 - but not RGui Ah ok. I suppose the fix is to get the hard path (C:/Program Files/...) on the search path and remove the symlink from the search path. Does that work? 2011/5/3 Stefan McKinnon H?j-Edwards : > Yes, the message is pretty clear, but it has nothing to do with running as admin. > I have just tried to start a command line with admin privileges and the error still occurs. > Regarding Rgui, I started it by opening the shortcut. > > Now I've tracked down the problem a bit, and the problem appears to be connected to which folder R is called from. > And by sheer luck I've resolved the problem: > In all previous versions of Windows, on the Danish editions, the "C:\Program Files" directory was called "C:\Programmer". This appears to be the case in Windows 7, but "C:\Programmer" is a symbolic link (hard/soft?) to "C:\Program Files". And apparently, I've been calling R from "C:\Programmer" instead of "C:\Program Files" which gave the problem. > When/how/why I changed the PATH variable to the symbolic link is unclear, but a quick check reveals that the problem did not exist in R 2.12.1: > C:\Programmer\R\R-2.12.1\bin\i386\Rterm ?# No problem > C:\Programmer\R\R-2.13.0\bin\i386\Rterm ?# Problem > > I will submit a bug report on this. > > Kind regards, > Stefan McKinnon Edwards > > > -----Oprindelig meddelelse----- > Fra: Jonathan Daily [mailto:biomathjdaily at gmail.com] > Sendt: 2. maj 2011 16:59 > Til: Stefan McKinnon H?j-Edwards > Cc: r-help at r-project.org > Emne: Re: [R] Problems with Rterm 2.13.0 - but not RGui > > The message is pretty clear. Access denied means you don't have > permission to access the path. This also explains why the packages > fail to load - you don't have access to R's package library. It most > likely works on RGui because you are clicking it/running it as admin > (you did not specify how you ran RGui). > > 2011/5/2 Stefan McKinnon H?j-Edwards : >> Hi all, >> >> I have just installed R 2.13.0 and I am experiencing problems with the terminal, but not the with the GUI interface. >> I am Windows 7. >> >> When running "R" or "Rterm" from a commandline I receive the following: >> >> Warning message: >> In normalizePath(path.expand(path), winslash, mustWork) : >> ?path[3]="C:/Programmer/R/R-2.13.0/library": Adgang n?gtet >> >> R version 2.13.0 (2011-04-13) >> Copyright (C) 2011 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> R is free software and comes with ABSOLUTELY NO WARRANTY. >> You are welcome to redistribute it under certain conditions. >> Type 'license()' or 'licence()' for distribution details. >> >> R is a collaborative project with many contributors. >> Type 'contributors()' for more information and >> 'citation()' on how to cite R or R packages in publications. >> >> Type 'demo()' for some demos, 'help()' for on-line help, or >> 'help.start()' for an HTML browser interface to help. >> Type 'q()' to quit R. >> >> Warning message: >> package "methods" in options("defaultPackages") was not found >> During startup - Warning messages: >> 1: package 'datasets' in options("defaultPackages") was not found >> 2: package 'utils' in options("defaultPackages") was not found >> 3: package 'grDevices' in options("defaultPackages") was not found >> 4: package 'graphics' in options("defaultPackages") was not found >> 5: package 'stats' in options("defaultPackages") was not found >> 6: package 'methods' in options("defaultPackages") was not found >> >> >> Notice: "C:/Programmer/" is the Danish equivalent of "C:/Program Files". >> The first error "Adgang n?gtet" is directly translated to "Access denied". >> >> Any suggestions as how to fix this? >> >> Kind regards, >> Stefan McKinnon Edwards >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > =============================================== > Jon Daily > Technician > =============================================== > #!/usr/bin/env outside > # It's great, trust me. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From debruinjj at gmail.com Tue May 3 15:23:54 2011 From: debruinjj at gmail.com (Jurgens de Bruin) Date: Tue, 3 May 2011 15:23:54 +0200 Subject: [R] Categorical bubble plot In-Reply-To: References: <4DA845DD.6080008@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ypriverol at gmail.com Tue May 3 15:38:01 2011 From: ypriverol at gmail.com (ypriverol) Date: Tue, 3 May 2011 06:38:01 -0700 (PDT) Subject: [R] Bigining with a Program of SVR In-Reply-To: References: <1304106463512-3484476.post@n4.nabble.com> Message-ID: <1304429881697-3492487.post@n4.nabble.com> well, first of all thank for your answer. I need some example that works with Support Vector Regression. This is the format of my data: VDP V1 V2 .... 9.15 1234.5 10 9.15 2345.6 15 6.7 789.0 12 6.7 234.6 11 3.2 123.6 5 3.2 235.7 8 VDP is the experimental value of the property that i want to predict with the model and more accurate. The other variables V1, V2 ... are the properties to generate the model. I need some examples that introduce me in this field. I read some examples from e1071 but all of them are for classification problems. thanks for your help in advance -- View this message in context: http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3492487.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Tue May 3 15:59:35 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 03 May 2011 15:59:35 +0200 Subject: [R] install rdcomclient source In-Reply-To: <1AD2AA98-CB8D-46B6-B24B-AC8B08E9DCED@gmail.com> References: <4DBFE657.8090509@statistik.tu-dortmund.de> <4DBFF2D2.1000906@statistik.tu-dortmund.de> <1AD2AA98-CB8D-46B6-B24B-AC8B08E9DCED@gmail.com> Message-ID: <4DC00A47.5050200@statistik.tu-dortmund.de> On 03.05.2011 15:05, Richard Wang wrote: > Thanks. I didn't know that. I just found it in Brian Ripley's page. Is this the cran extras? Right, and under Windows it is a default repository. Uwe Ligges > Thanks > Richard > > > > On 3 May 2011, at 13:19, Uwe Ligges wrote: > >> >> >> On 03.05.2011 14:09, Richard Wang wrote: >>> Thanks. One more question. If I use install.packsges, do I need to install Rtool or utils package is sufficient? >> >> >> It depends on how demanding the package is. For the one you mentioned, you will need the Rtools, since C/C++ sources are to be compiled. And it won't work out of the box, since some manual tweaks are required - at least the last time I tried. >> >> Do you know that the package is available from CRAN extras if form of a Windows binary? install.packages() without the type="source" argument should work right away. >> >> Uwe Ligges >> >> >>> Thanks, >>> Richard >>> >>> >>> On 3 May 2011, at 12:26, Uwe Ligges wrote: >>> >>>> >>>> >>>> On 02.05.2011 23:48, Richard Wang wrote: >>>>> Hi, >>>>> >>>>> I'd like to ask a installation question. I want to install a source code >>>>> through the following command, >>>>> R CMD INSTALL RDCOMClient >>>> >>>> >>>> This is intended to be used in the shell of your OS (assuming Windows given the package), not in R. >>>> >>>> From within R use install.packages("RDCOMClient", type="source") if you really want to install from source. >>>> >>>> Uwe Ligges >>>> >>>>> but get Error: unexpected symbol in "r cmd" >>>>> >>>>> Please let know if I miss anything. I my utils package loaded. >>>>> >>>>> Thanks, >>>>> Richard >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. From djsloan at liv.ac.uk Tue May 3 16:06:13 2011 From: djsloan at liv.ac.uk (dereksloan) Date: Tue, 3 May 2011 07:06:13 -0700 (PDT) Subject: [R] Generating summary statistics and simple statistical analysis from my data-set: how can I automate the analysis? Message-ID: <1304431573517-3492537.post@n4.nabble.com> I am fairly new to R and have a (for me) slightly complicated set of data to analyse. It contains several continuous and categorical variables for a group of individuals ? e.g; ID Sex Age Familysize Phone Education 1 M 23 3 Yes Primary 2 F 25 4 Yes Secondary 3 M 33 5 No Tertiary 4 F 45 1 Yes Secondary 5 F 67 10 Yes Secondary I want to summarise it in a table as follows; All individuals Male Female Comparison between sexes (I want to put p-values in this column) Age Median (range) Median (range) Median (range) Wilcoxon rank sum test Family size Median (range) Median (range) Median (range) Wilcoxon rank sum test Phone Number Yes (%) Number Yes (%) Number Yes (%) Chi-squared test Education Chi-squared test Primary Number (%) Number (%) Number (%) Secondary Number (%) Number (%) Number (%) Tertiary Number (%) Number (%) Number (%) How can I use R to do this? For the continuous variables I know I can write code like; summary(Age) by(Age,data["Sex"],summary) wilcox.test(Age~Sex) summary(Familysize) by(Familysize,data[?Sex?],summary) Wilcox.test(Familysize~Sex) but is there any way of automating/looping the analysis so that I get summaries and comparative statistical analysis of all of the continuous variables in a single command? I?m sure this could be done by some kind of ?looping? given that the analysis is always the same. Presumably I then still have to copy the output of interest (medians, ranges, p-values) into the summary table manually? For each categorical variable I have really cumbersome code from which I can extract the information I need from each variable for the summary table? e.g, tphone<-xtabs(~Phone+Sex,data=data) N<-margin.table(tphone,2) tphone1<-rbind(tphone,N) Total<-margin.table(tphone1,1) tphone1<-cbind(tfbise3xul1,Total) tphone1<-t(tphone1) tphone1<-as.data.frame(tphone1) tphone2<-within(tphone1,{ per.No<-100*(No/N) per.Yes<-100*(Yes/N) tphone2<-tphone2[,c(3,2,4,1,5)] tphone2 chisq.test(tphone) but there must be better ways of generating the counts, percentages, and simple statistical analysis which I need. Again, can I loop it to do all of my categorical variables at once? Obviously my dataset has more continuous and categorical variables than those shown above but I?ve abbreviated it for simplicity of explanation ? I need to write simpler/looped code so that the whole thing is not crazily long-winded. Sorry that my approach so far is so bad and long-winded! R is a long uphill curve to start with, so I?m be very grateful for any help I can get from anyone who won?t laugh at me. Derek -- View this message in context: http://r.789695.n4.nabble.com/Generating-summary-statistics-and-simple-statistical-analysis-from-my-data-set-how-can-I-automate-th-tp3492537p3492537.html Sent from the R help mailing list archive at Nabble.com. From patrick.breheny at uky.edu Tue May 3 16:08:51 2011 From: patrick.breheny at uky.edu (Breheny, Patrick) Date: Tue, 3 May 2011 10:08:51 -0400 Subject: [R] Axis trouble In-Reply-To: References: Message-ID: <408338F86F0D4243BD5E7B74A8C0862B20569E19EA@EX7FM03.ad.uky.edu> The expression 0:g_range[2] is not meaningful. The : operator is for integers, while your data is continuous. Likely, you want something along the lines of axis(2, las=1, at=pretty(vecAVG)) _______________________ Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of swaraj basu Sent: Tuesday, May 03, 2011 9:13 AM To: r-help at r-project.org Subject: [R] Axis trouble Hello Everyone, I am having problem in defining specific axis for plotting a vactor. vecAVG <- c(0.2, 0.4, 0.6, 0.2, 0.4) names(vecAVG)<-c("brain","heart","kidney","lung","blood") par(mar=c(12,4.1,4.1, 2.1)) plot(sort(vecAVG,decreasing=TRUE),type="p",pch=19,col="darkslateblue",axes=FALSE,ann=FALSE) g_range<-range(vecAVG) axis(1,at=0:length(vecAVG),lab=names(vecAVG),las=2) axis(2, las=1, at=0:g_range[2]) After these commands I am getting the graph but it does not have any Y axis. I know I am making a silly mistake somewhere. Can someone please guide me. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Tue May 3 16:10:39 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 3 May 2011 07:10:39 -0700 Subject: [R] Axis trouble In-Reply-To: References: Message-ID: <05575AA2-23F9-4E53-9597-CB1AF784E729@comcast.net> On May 3, 2011, at 6:12 AM, swaraj basu wrote: > Hello Everyone, > I am having problem in defining specific > axis for > plotting a vactor. > > vecAVG <- c(0.2, 0.4, 0.6, 0.2, 0.4) > > names(vecAVG)<-c("brain","heart","kidney","lung","blood") > > > par(mar=c(12,4.1,4.1, 2.1)) > > plot > (sort > (vecAVG > ,decreasing > =TRUE),type="p",pch=19,col="darkslateblue",axes=FALSE,ann=FALSE) > g_range<-range(vecAVG) > > > axis(1,at=0:length(vecAVG),lab=names(vecAVG),las=2) > axis(2, las=1, at=0:g_range[2]) > > After these commands I am getting the graph but it does not have any > Y axis. > I know I am making a silly mistake somewhere. Can someone please > guide me. I am guessing the you come from a different programming planet where vectors go from 0 upwards. This is the problem: at=0:length(vecAVG) Try: at=1:length(vecAVG) And you should post your error messages with your code. -- David Winsemius, MD Heritage Laboratories West Hartford, CT From dwinsemius at comcast.net Tue May 3 16:16:25 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 3 May 2011 07:16:25 -0700 Subject: [R] Axis trouble In-Reply-To: <05575AA2-23F9-4E53-9597-CB1AF784E729@comcast.net> References: <05575AA2-23F9-4E53-9597-CB1AF784E729@comcast.net> Message-ID: On May 3, 2011, at 7:10 AM, David Winsemius wrote: > > On May 3, 2011, at 6:12 AM, swaraj basu wrote: > >> Hello Everyone, >> I am having problem in defining specific >> axis for >> plotting a vactor. >> >> vecAVG <- c(0.2, 0.4, 0.6, 0.2, 0.4) >> >> names(vecAVG)<-c("brain","heart","kidney","lung","blood") >> >> >> par(mar=c(12,4.1,4.1, 2.1)) >> >> plot >> (sort >> (vecAVG >> ,decreasing >> =TRUE),type="p",pch=19,col="darkslateblue",axes=FALSE,ann=FALSE) >> g_range<-range(vecAVG) >> >> >> axis(1,at=0:length(vecAVG),lab=names(vecAVG),las=2) >> axis(2, las=1, at=0:g_range[2]) >> >> After these commands I am getting the graph but it does not have >> any Y axis. >> I know I am making a silly mistake somewhere. Can someone please >> guide me. > > I am guessing the you come from a different programming planet where > vectors go from 0 upwards. This is the problem: > > at=0:length(vecAVG) > > Try: > at=1:length(vecAVG) > > And you should post your error messages with your code. And the simple fix to the second problem is just: axis(2, las=1) > > -- > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT From mxkuhn at gmail.com Tue May 3 16:24:37 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Tue, 3 May 2011 10:24:37 -0400 Subject: [R] Bigining with a Program of SVR In-Reply-To: <1304429881697-3492487.post@n4.nabble.com> References: <1304106463512-3484476.post@n4.nabble.com> <1304429881697-3492487.post@n4.nabble.com> Message-ID: See the examples at the end of: http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf for a QSAR data set for modeling the log blood-brain barrier concentration. SVMs are not used there but, if you use train(), the syntax is very similar. On Tue, May 3, 2011 at 9:38 AM, ypriverol wrote: > well, first of all thank for your answer. I need some example that works with > Support Vector Regression. This is the format of my data: > ?VDP ? V1 ? ? ? ?V2 ?.... > ?9.15 ?1234.5 ? 10 > ?9.15 2345.6 15 > ?6.7 ? ?789.0 ? ? 12 > ?6.7 ? ?234.6 ? ? 11 > ?3.2 ? 123.6 ? ? ?5 > ?3.2 ? 235.7 ? ? ?8 > > VDP is the experimental value of the property that i want to predict with > the model and more accurate. The other variables V1, V2 ... are the > properties to generate the model. I need some examples that introduce me in > this field. I read some examples from e1071 but all of them are for > classification problems. > > thanks for your help in advance > > -- > View this message in context: http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3492487.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From ripley at stats.ox.ac.uk Tue May 3 16:36:45 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Tue, 3 May 2011 15:36:45 +0100 (BST) Subject: [R] fitting distributions using fitdistr (MASS) In-Reply-To: <1304416399408-3492103.post@n4.nabble.com> References: <1304416399408-3492103.post@n4.nabble.com> Message-ID: On Tue, 3 May 2011, Usha wrote: > Please guide me through to resolve the error message that I get > > this is what i have done. > >> x1<- rnorm(100,2,1) >> x1fitbeta<-fitdistr(x1,"beta") > Error in fitdistr(x1, "beta") : 'start' must be a named list You have many errors, starting with not reading the posting guide. But the main ones seem to be: (A) A beta distribution has support (0,1). I bet your data are not confined to that interval. If true (and you failed to give the reproducible example the posting guide asked for), then the log-likelihood is -Inf ('not finite') for any value of the parameters. fitdistr() is support software for a book: it is not a tutorial on the basics of maximum-likelihood estimation. (B) As the help says For the following named distributions, reasonable starting values will be computed if ?start? is omitted or only partially specified: ?"cauchy"?, ?"gamma"?, ?"logistic"?, ?"negative binomial"? (parametrized by ?mu? and ?size?), ?"t"? and ?"weibull"?. Since beta is not one of those, you need to specify starting values. > Yes, I do understand that sometime for the distribution to converge to the > given set of data, it requires initial parameters of the distribution, to > start off with. Hence, i tried this > >> x1fitbeta<-fitdistr(x1,densfun=dbeta, start=list(shape1=2,shape2=3)) > Error in optim(x = c(1.89074018737135, 1.52649293971978, 2.19950245230280, > : > initial value in 'vmmin' is not finite > > I tried with "f" and "chi-square" what i did with "t". Please find below > the output. > >> x1fitt<-fitdistr(x1,"t") > Error in fitdistr(x1, "t") : optimization failed > In addition: Warning messages: > 1: In log(s) : NaNs produced > 2: In log(s) : NaNs produced > 3: In log(s) : NaNs produced > 4: In log(s) : NaNs produced > 5: In log(s) : NaNs produced > 6: In log(s) : NaNs produced > >> x1fitt<-fitdistr(x1,"t", df=1) > Warning messages: > 1: In log(s) : NaNs produced > 2: In log(s) : NaNs produced > > >> x1fitf<-fitdistr(x1,"f",start=list(df1=2,df2=3)) > Warning message: > In df(x, df1, df2, log) : NaNs produced >> x1fitf > df1 df2 > 5.6733242 4.4962519 > (1.3407776) (0.9016752) No guarantee that your x1 values are positive, either. > >> x1fitchi<-fitdistr(x1,"chi-squared",df=3) > Error in fitdistr(x1, "chi-squared", df = 3) : > 'start' must be a named list > > It is the same as what i gave for beta?!! > >> x1fitbeta<-fitdistr(x1,"beta", start=list(shape1=2,shape2=3)) > Error in optim(x = c(1.89074018737135, 1.52649293971978, 2.19950245230280, > : > initial value in 'vmmin' is not finite > > What is the right syntax....why do I get error for only some, what are the > exceptions? > I dont know how rectify this error. please resolve > > Thanks in advance. > > -- > View this message in context: http://r.789695.n4.nabble.com/fitting-distributions-using-fitdistr-MASS-tp3492103p3492103.html > Sent from the R help mailing list archive at Nabble.com. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From rvaradhan at jhmi.edu Tue May 3 16:37:16 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Tue, 3 May 2011 10:37:16 -0400 Subject: [R] fitting distributions using fitdistr (MASS) In-Reply-To: <1304416399408-3492103.post@n4.nabble.com> References: <1304416399408-3492103.post@n4.nabble.com> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F31E@WIGGUMVS.win.ad.jhu.edu> Your simulation example is bad. You cannot fit a beta distribution to a data that is not in [0,1], leave alone negative data. x <- runif(1007) fitdistr(x, "beta", start=list(shape1=0.5, shape2=0.5)) But try this instead: x <- runif(100, 1, 27) fitdistr(x, "beta", start=list(shape1=0.5, shape2=0.5)) It seems that when "beta" and "chi-squared" are specified as the distribution to be estimated, the "start" parameter for optimization have to be specified by the user. The code does not use any default starting value. Ravi. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Usha [usha.nair at tcs.com] Sent: Tuesday, May 03, 2011 5:53 AM To: r-help at r-project.org Subject: [R] fitting distributions using fitdistr (MASS) Please guide me through to resolve the error message that I get this is what i have done. >x1<- rnorm(100,2,1) >x1fitbeta<-fitdistr(x1,"beta") Error in fitdistr(x1, "beta") : 'start' must be a named list Yes, I do understand that sometime for the distribution to converge to the given set of data, it requires initial parameters of the distribution, to start off with. Hence, i tried this >x1fitbeta<-fitdistr(x1,densfun=dbeta, start=list(shape1=2,shape2=3)) Error in optim(x = c(1.89074018737135, 1.52649293971978, 2.19950245230280, : initial value in 'vmmin' is not finite I tried with "f" and "chi-square" what i did with "t". Please find below the output. > x1fitt<-fitdistr(x1,"t") Error in fitdistr(x1, "t") : optimization failed In addition: Warning messages: 1: In log(s) : NaNs produced 2: In log(s) : NaNs produced 3: In log(s) : NaNs produced 4: In log(s) : NaNs produced 5: In log(s) : NaNs produced 6: In log(s) : NaNs produced > x1fitt<-fitdistr(x1,"t", df=1) Warning messages: 1: In log(s) : NaNs produced 2: In log(s) : NaNs produced > x1fitf<-fitdistr(x1,"f",start=list(df1=2,df2=3)) Warning message: In df(x, df1, df2, log) : NaNs produced > x1fitf df1 df2 5.6733242 4.4962519 (1.3407776) (0.9016752) >x1fitchi<-fitdistr(x1,"chi-squared",df=3) Error in fitdistr(x1, "chi-squared", df = 3) : 'start' must be a named list It is the same as what i gave for beta?!! > x1fitbeta<-fitdistr(x1,"beta", start=list(shape1=2,shape2=3)) Error in optim(x = c(1.89074018737135, 1.52649293971978, 2.19950245230280, : initial value in 'vmmin' is not finite What is the right syntax....why do I get error for only some, what are the exceptions? I dont know how rectify this error. please resolve Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/fitting-distributions-using-fitdistr-MASS-tp3492103p3492103.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From bt_jannis at yahoo.de Tue May 3 17:27:32 2011 From: bt_jannis at yahoo.de (Jannis) Date: Tue, 3 May 2011 16:27:32 +0100 (BST) Subject: [R] loading only parts of RData files? Message-ID: <39448.25283.qm@web28206.mail.ukl.yahoo.com> Dear List members, I would like to load R objects saved as RData file but ran into the problem that these objects are too large for my RAM ('Can not allocate vactor of size XX...'). Switching to a Linux machine is no option, neither is raising the memory limit. I am now wondering whether it is possible to load only specific single objects saved in these *.RData files into the workspace. As these objects usually are multidimensional arrays, ideally I would be able to load only specific dimensions of these arrays. IS there any possibility to do such in R? I know that NCDF files (and the RNetCDF package) provides possibilities to only read specific parts of the data into the memory, but I switching to this format would involve a lot of hassle so I would prefer to stay with something as convenient as save() and load() without switching to non R file formats. Thanks for any suggestions Jannis From rob.cassidy at concordia.ca Tue May 3 17:06:26 2011 From: rob.cassidy at concordia.ca (Rob Cassidy) Date: Tue, 3 May 2011 08:06:26 -0700 (PDT) Subject: [R] Help converting a data.frame to ordered factors In-Reply-To: References: Message-ID: <1304435186239-3492705.post@n4.nabble.com> Thanks, Phil. I could have sworn that I tried that (several times). It works perfectly, of course. Thanks again, Robert -- View this message in context: http://r.789695.n4.nabble.com/Help-converting-a-data-frame-to-ordered-factors-tp3490838p3492705.html Sent from the R help mailing list archive at Nabble.com. From ypriverol at gmail.com Tue May 3 17:19:57 2011 From: ypriverol at gmail.com (ypriverol) Date: Tue, 3 May 2011 08:19:57 -0700 (PDT) Subject: [R] Bigining with a Program of SVR In-Reply-To: References: <1304106463512-3484476.post@n4.nabble.com> <1304429881697-3492487.post@n4.nabble.com> Message-ID: <1304435997731-3492746.post@n4.nabble.com> I saw the format of the caret data some days ago. It is possible to convert my csv data with the same data a format as the caret dataset. My idea is to use firstly the same scripts as caret tutorial, then i want to remove problems related with data formats and incompatibilities. Thanks for your time -- View this message in context: http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3492746.html Sent from the R help mailing list archive at Nabble.com. From alaios at yahoo.com Tue May 3 17:32:18 2011 From: alaios at yahoo.com (Alaios) Date: Tue, 3 May 2011 08:32:18 -0700 (PDT) Subject: [R] Sum the cell of a vector Message-ID: <118826.47021.qm@web120110.mail.ne1.yahoo.com> Dear all, I would like to know what is the most time efficient way to calculate the following in a huge vector. Let's say that I have the vector 1,2,3,4,5,6 and I want to return a vector of the same length which every cell containing the sum of the previous cells like this 1, 1+2, 1+2+3, 1+2+3+4, 1+2+3+4+5, 1+2+3+4+5+6 What do you think is the most effective way to do that (not having to do the previous sum). Of course I can do this with a for loop but I believe that might be a more efficient way in R. I would like to thank you in advance for your help Best Regards Alex From biomathjdaily at gmail.com Tue May 3 17:39:38 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Tue, 3 May 2011 11:39:38 -0400 Subject: [R] Sum the cell of a vector In-Reply-To: <118826.47021.qm@web120110.mail.ne1.yahoo.com> References: <118826.47021.qm@web120110.mail.ne1.yahoo.com> Message-ID: ?cumsum On Tue, May 3, 2011 at 11:32 AM, Alaios wrote: > Dear all, > I would like to know what is the most time efficient way to calculate the following in a huge vector. > > Let's say that I have the vector > > 1,2,3,4,5,6 > and I want to return a vector of the same length which every cell containing the sum of the previous cells like this > > 1, 1+2, 1+2+3, 1+2+3+4, 1+2+3+4+5, 1+2+3+4+5+6 > > What do you think is the most effective way to do that (not having to do the previous sum). > Of course I can do this with a for loop but I believe that might be a more efficient way in R. > > I would like to thank you in advance for your help > > Best Regards > Alex > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From utz.ryan at gmail.com Tue May 3 18:36:22 2011 From: utz.ryan at gmail.com (Ryan Utz) Date: Tue, 3 May 2011 10:36:22 -0600 Subject: [R] Controlling the extent of ablines on plot Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jerome.asselin.stat at gmail.com Tue May 3 19:10:35 2011 From: jerome.asselin.stat at gmail.com (Jerome Asselin) Date: Tue, 03 May 2011 13:10:35 -0400 Subject: [R] Controlling the extent of ablines on plot In-Reply-To: References: Message-ID: <1304442635.25057.32.camel@localhost> On Tue, 2011-05-03 at 10:36 -0600, Ryan Utz wrote: > Hi all, > > I'm attempting to make a quite-specific plot where the axes cross at the > origin and with gridlines for guidance. I've been using ablines to create > the reference lines because I want a lot of control as to where they are > placed on the axis. This command works very well for such control. > However... > > These ablines don't seem to work when I specify the origin as 0,0. They go > beyond the x-axis at both ends, rendering a quite ugly graph (and they push > back the y-axis title some). > > Behold: > > ### > x<-c(0,1,2,3,4,5) > y<-c(0,2,4,6,8,10) > > plot(x,y, axes=FALSE) > axis(1,at=c(0,1,2,2.5,3,4,5),pos=0) > axis(2,at=c(0,2,4,6,8,10),pos=0) > abline(h=c(1,2,3,4,5)) > ### > > Is there any way for me to specify that these ablines should not go beyond > the y-axis extent? I just want a pretty graph! > > Thanks! > Ryan > Maybe you could use lines()? plot(1:2) lines(c(1,2),c(1.5,1.5)) From chschulz at email.de Tue May 3 19:21:09 2011 From: chschulz at email.de (Christian Schulz) Date: Tue, 03 May 2011 19:21:09 +0200 Subject: [R] loading only parts of RData files? In-Reply-To: <39448.25283.qm@web28206.mail.ukl.yahoo.com> References: <39448.25283.qm@web28206.mail.ukl.yahoo.com> Message-ID: <4DC03985.4030108@email.de> Hi, g.data is perhaps a interesting package for you. HTH, Christian > Dear List members, > > > I would like to load R objects saved as RData file but ran into the problem that these objects are too large for my RAM ('Can not allocate vactor of size XX...'). Switching to a Linux machine is no option, neither is raising the memory limit. > > I am now wondering whether it is possible to load only specific single objects saved in these *.RData files into the workspace. As these objects usually are multidimensional arrays, ideally I would be able to load only specific dimensions of these arrays. IS there any possibility to do such in R? > > I know that NCDF files (and the RNetCDF package) provides possibilities to only read specific parts of the data into the memory, but I switching to this format would involve a lot of hassle so I would prefer to stay with something as convenient as save() and load() without switching to non R file formats. > > > Thanks for any suggestions > Jannis > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From utz.ryan at gmail.com Tue May 3 19:26:53 2011 From: utz.ryan at gmail.com (Ryan Utz) Date: Tue, 3 May 2011 11:26:53 -0600 Subject: [R] Controlling the extent of ablines on plot In-Reply-To: <1304442635.25057.32.camel@localhost> References: <1304442635.25057.32.camel@localhost> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From hcatbr at yahoo.co.in Tue May 3 18:16:08 2011 From: hcatbr at yahoo.co.in (HC) Date: Tue, 3 May 2011 09:16:08 -0700 (PDT) Subject: [R] adaptIntegrate - how to pass additional parameters to the integrand In-Reply-To: <4DBFE431.3070904@statistik.tu-dortmund.de> References: <1304397791609-3491701.post@n4.nabble.com> <4DBFE431.3070904@statistik.tu-dortmund.de> Message-ID: <1304439368993-3492903.post@n4.nabble.com> Dr. Ligges, Thanks a lot for providing syntax for passing additional parameters. It worked for me and has solved my problem. Many thanks for your quick help. HC -- View this message in context: http://r.789695.n4.nabble.com/adaptIntegrate-how-to-pass-additional-parameters-to-the-integrand-tp3491701p3492903.html Sent from the R help mailing list archive at Nabble.com. From rob.cassidy at concordia.ca Tue May 3 18:51:47 2011 From: rob.cassidy at concordia.ca (Rob Cassidy) Date: Tue, 3 May 2011 09:51:47 -0700 (PDT) Subject: [R] Help converting a data.frame to ordered factors In-Reply-To: <1304435186239-3492705.post@n4.nabble.com> References: <1304435186239-3492705.post@n4.nabble.com> Message-ID: <1304441507167-3492979.post@n4.nabble.com> Hi again, Now that I have the data.frame as ordered factors, when I try to transpose it, I lose the factor orders. > datfact<-data.frame(c1,c2,c96) > sapply(datfact, class) c1 c2 c96 [1,] "ordered" "ordered" "ordered" [2,] "factor" "factor" "factor" > > dafacT<-as.data.frame(t(datfact)) > sapply(datfacT,class) item1 item2 item3 item4 ... "factor" "factor" "factor" "factor" ... > > Is there a simple way to re-impose ordering on the transposed df? or alternatively to transpose the df without losing the ordering? Many thanks, Robert -- View this message in context: http://r.789695.n4.nabble.com/Help-converting-a-data-frame-to-ordered-factors-tp3490838p3492979.html Sent from the R help mailing list archive at Nabble.com. From sbroad00 at gmail.com Tue May 3 17:36:43 2011 From: sbroad00 at gmail.com (Stuart) Date: Tue, 3 May 2011 08:36:43 -0700 (PDT) Subject: [R] data transformation ----Box-Cox Transformations Message-ID: <42f9b7cf-b628-42ba-845b-8d7153a5c337@q32g2000yqn.googlegroups.com> Hi Could any one please help how I can trnasform data based on Box-Cox Transformations. I have massive data set with many variables. If possible someone can write few lines so I can read in all data set once and transform it. g1 g2 g2 97.03703704 89.25925926 4.444444444 24.90740741 69.25925926 35.55555556 62.22222222 85.18518519 36.85185185 18.51851852 84.25925926 21.66666667 93.7037037 95.92592593 54.07407407 26.66666667 23.33333333 99.25925926 63.33333333 97.03703704 27.40740741 95.74074074 3.611111111 59.25925926 46.66666667 49.44444444 39.16666667 21.85185185 2.592592593 63.14814815 94.72222222 17.77777778 81.11111111 any help will be much appreciated Cheers Sbroad From Vanselow at gmx.de Tue May 3 18:10:52 2011 From: Vanselow at gmx.de (Kim Vanselow) Date: Tue, 03 May 2011 18:10:52 +0200 Subject: [R] step.gam with a list of data frames Message-ID: <20110503161052.48540@gmx.net> Dear R-helpers, I used the step.gam function (package gam, Trevor Hastie) on a data frame without problems. Then I created a list of several bootstrap samples from this data frame. Now I want to use the step.gam function on this list using a for-loop. The code is working well until the step.gam function starts. Step.gam is starting but then suddenly breaks (see error below). However, if I replace the for-loop (see below) the whole thing is working well. for-loop with which the code does not work: for(i in 1:4){ fit_list[[i]] <- gam(dom_teresken ~ 1, family = poisson, data = Pamir_res[[i]]) } Replace this with: fit_list[[1]] <- gam(dom_teresken ~ 1, family = poisson, data = Pamir_res[[1]]) fit_list[[2]] <- gam(dom_teresken ~ 1, family = poisson, data = Pamir_res[[2]]) fit_list[[3]] <- gam(dom_teresken ~ 1, family = poisson, data = Pamir_res[[3]]) fit_list[[4]] <- gam(dom_teresken ~ 1, family = poisson, data = Pamir_res[[4]]) Now everything works well. But I have to use the loop as I want to use much more than only 4 bootstrap samples. This is the error message: Start: dom_teresken ~ 1; AIC= 516.6801 Trial: dom_teresken ~ ALTITUDE + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1; AIC= 512.2938 Trial: dom_teresken ~ 1 + SLOPE + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1; AIC= 495.0212 Trial: dom_teresken ~ 1 + 1 + SOUTH_EXPOSEDNESS + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1; AIC= 486.8359 Trial: dom_teresken ~ 1 + 1 + 1 + WEST_EXPOSEDNESS + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1; AIC= 529.8005 Trial: dom_teresken ~ 1 + 1 + 1 + 1 + DISTANCE_TO_ISOBATH + 1 + 1 + 1 + 1 + 1 + 1 + 1Error in Pamir_res[[i]] : subscript out of bounds This is my full code: # take bootstrap samples require(MIfuns) # resample functions Pamir_res <- resample(Pamir, names= 1:4, replace=TRUE) # take n bootstrap # samples from data = Pamir # calculate gam for each bootstrap sample fit_list <- vector(4, mode = "list") # create empty list for(i in 1:4){ fit_list[[i]] <- gam(dom_teresken ~ 1, family = poisson, data = Pamir_res[[i]]) } gam.scope = list( ~ 1 + ALTITUDE + s(ALTITUDE, 2), ~ 1 + SLOPE + s(SLOPE, 2), ~ 1 + SOUTH_EXPOSEDNESS + s(SOUTH_EXPOSEDNESS, 2), ~ 1 + WEST_EXPOSEDNESS + s(WEST_EXPOSEDNESS, 2), ~ 1 + DISTANCE_TO_ISOBATH + s(DISTANCE_TO_ISOBATH, 2), ~ 1 + UTM_NORTHING + s(UTM_NORTHING, 2), ~ 1 + UTM_EASTING + s(UTM_EASTING, 2), ~ 1 + DISTANCE_TO_SETTLEMENT, s(DISTANCE_TO_SETTLEMENT, 2), ~ 1 + NDVI_Rededge_R_Mean + s(NDVI_Rededge_R_Mean, 2), ~ 1 + NDVI_IR_Rededge_t_Mean + s(NDVI_IR_Rededge_t_Mean, 2), ~ 1 + Bd_5_Mean + s(Bd_5_Mean, 2), ~ 1 + Bd_1t_Mean + s(Bd_1t_Mean, 2)) fit_boot <- vector(4, mode = "list") # create empty list for(i in 1:4){ fit_boot[[i]] <- step.gam(fit_list[[i]], scope = gam.scope, direction = "both", trace = TRUE) } I really tried to find suggestions on the internet and in nabble. Unfortunately I could not solve the problem. Please help me! Thank you very much, Kim -- From Greg.Snow at imail.org Tue May 3 19:28:13 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Tue, 3 May 2011 11:28:13 -0600 Subject: [R] Controlling the extent of ablines on plot In-Reply-To: References: Message-ID: Check your par() settings, specifically "xpd". For more control see ?clip. If that does not do enough for you then use lines or segments for complete control. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Ryan Utz > Sent: Tuesday, May 03, 2011 10:36 AM > To: r-help at r-project.org > Subject: [R] Controlling the extent of ablines on plot > > Hi all, > > I'm attempting to make a quite-specific plot where the axes cross at > the > origin and with gridlines for guidance. I've been using ablines to > create > the reference lines because I want a lot of control as to where they > are > placed on the axis. This command works very well for such control. > However... > > These ablines don't seem to work when I specify the origin as 0,0. They > go > beyond the x-axis at both ends, rendering a quite ugly graph (and they > push > back the y-axis title some). > > Behold: > > ### > x<-c(0,1,2,3,4,5) > y<-c(0,2,4,6,8,10) > > plot(x,y, axes=FALSE) > axis(1,at=c(0,1,2,2.5,3,4,5),pos=0) > axis(2,at=c(0,2,4,6,8,10),pos=0) > abline(h=c(1,2,3,4,5)) > ### > > Is there any way for me to specify that these ablines should not go > beyond > the y-axis extent? I just want a pretty graph! > > Thanks! > Ryan > > -- > > Ryan Utz, Ph.D. > Aquatic Ecologist/STREON Scientist > National Ecological Observatory Network > > Home/Cell: (724) 272-7769 > Work: (720) 746-4844 ext. 2488 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From w.gostner at ipp.bz.it Tue May 3 17:44:31 2011 From: w.gostner at ipp.bz.it (Woida71) Date: Tue, 3 May 2011 08:44:31 -0700 (PDT) Subject: [R] Simple loop Message-ID: <1304437471793-3492819.post@n4.nabble.com> Hello everybody, I am beginning with loops and functions and would be glad to have help in the following question: If i have a dataframe like this Site Prof H 1 1 24 1 1 16 1 1 67 1 2 23 1 2 56 1 2 45 2 1 67 2 1 46 And I would like to create a new column that subtracts the minimum of H from H, but for S1 and P1 only the minimum of the data points falling into this category should be taken. So for example the three first numbers of the new column write: 24-16, 16-16, 67-16 the following numbers refering to Site1 and Prof2 write: 23-23, 56-23, 45-23. I think with two loops one refering to the Site, the other to the Prof, it should be possible to automatically create the new column. Thanks a lot for any help. -- View this message in context: http://r.789695.n4.nabble.com/Simple-loop-tp3492819p3492819.html Sent from the R help mailing list archive at Nabble.com. From jackt at fpc.org Tue May 3 19:28:33 2011 From: jackt at fpc.org (Jack T.) Date: Tue, 3 May 2011 10:28:33 -0700 (PDT) Subject: [R] RODBC: forcing a special column to be read in as character In-Reply-To: References: Message-ID: <1304443713017-3493081.post@n4.nabble.com> I've had the same problem and ended up using the xlsReadWrite package. It takes more time to import a sheet but does have the colClasses command. Following your example: library(xlsReadWrite) read.xls("testtable", sheet = "sheet1", colClasses="character") should worked, it did for me -- View this message in context: http://r.789695.n4.nabble.com/RODBC-forcing-a-special-column-to-be-read-in-as-character-tp2993624p3493081.html Sent from the R help mailing list archive at Nabble.com. From Greg.Snow at imail.org Tue May 3 19:33:55 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Tue, 3 May 2011 11:33:55 -0600 Subject: [R] data transformation ----Box-Cox Transformations In-Reply-To: <42f9b7cf-b628-42ba-845b-8d7153a5c337@q32g2000yqn.googlegroups.com> References: <42f9b7cf-b628-42ba-845b-8d7153a5c337@q32g2000yqn.googlegroups.com> Message-ID: There is the bct function in the TeachingDemos package that does Box-Cox transforms (though you could also write your own fairly simply). The lappy/sapply functions will apply a function to each column of a data frame. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Stuart > Sent: Tuesday, May 03, 2011 9:37 AM > To: r-help at r-project.org > Subject: [R] data transformation ----Box-Cox Transformations > > Hi > > Could any one please help how I can trnasform data based on Box-Cox > Transformations. I have massive data set with many variables. If > possible someone can write few lines so I can read in all data set > once and transform it. > > > g1 g2 g2 > 97.03703704 89.25925926 4.444444444 > 24.90740741 69.25925926 35.55555556 > 62.22222222 85.18518519 36.85185185 > 18.51851852 84.25925926 21.66666667 > 93.7037037 95.92592593 54.07407407 > 26.66666667 23.33333333 99.25925926 > 63.33333333 97.03703704 27.40740741 > 95.74074074 3.611111111 59.25925926 > 46.66666667 49.44444444 39.16666667 > 21.85185185 2.592592593 63.14814815 > 94.72222222 17.77777778 81.11111111 > > > any help will be much appreciated > > Cheers > Sbroad > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From bioprogrammer at gmail.com Tue May 3 19:35:41 2011 From: bioprogrammer at gmail.com (Caitlin) Date: Tue, 3 May 2011 10:35:41 -0700 Subject: [R] Constructing a histogram with words as labels as height as frequency? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Tue May 3 19:40:47 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 3 May 2011 10:40:47 -0700 Subject: [R] Controlling the extent of ablines on plot In-Reply-To: References: <1304442635.25057.32.camel@localhost> Message-ID: Hi Ryan, The issue is the plotting region is slightly padded. The easiest option, I think, would be to clip() it. I have a general sense that one of the par() options would let you adjust the padding to 0, but I could just be imagining that (anyone else??). Anyway, here are some options: ### plot(0:5, seq(0, 10, 2), axes=FALSE) axis(1, at=c(0,1,2,2.5,3,4,5), pos=0) axis(2, at=seq(0, 10, 2), pos=0) ## using clip and abline clip(0, 5, 0, 10) abline(h = 1:5) dev.new() plot(0:5, seq(0, 10, 2), axes=FALSE) axis(1, at=c(0,1,2,2.5,3,4,5), pos=0) axis(2, at=seq(0, 10, 2), pos=0) ## using lines, but without retyping as much sapply(1:5, function(y) lines(c(0, 5), c(y, y))) ## or even easier, getting the same thing with segments segments(0, 1:5, 5, 1:5) HTH, Josh On Tue, May 3, 2011 at 10:26 AM, Ryan Utz wrote: > Well... that could work. Problem is in the actual graphs I'm making, there > are to be >30 lines per graph (as many as 60 in some cases). Any way I could > use the lines command without having to write out 60 lines of code per > figure? That's why I like ablines; you just have to specify a single value > and it will put a horizontal line at that number. > > Thanks, > Ryan > > >> Maybe you could use lines()? >> >> plot(1:2) >> lines(c(1,2),c(1.5,1.5)) >> >> >> > > > -- > > Ryan Utz, Ph.D. > Aquatic Ecologist/STREON Scientist > National Ecological Observatory Network > > Home/Cell: (724) 272-7769 > Work: (720) 746-4844 ext. 2488 > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From jwiley.psych at gmail.com Tue May 3 19:50:19 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 3 May 2011 10:50:19 -0700 Subject: [R] Controlling the extent of ablines on plot In-Reply-To: References: <1304442635.25057.32.camel@localhost> Message-ID: On Tue, May 3, 2011 at 10:40 AM, Joshua Wiley wrote: > The issue is the plotting region is slightly padded. ?The easiest > option, I think, would be to clip() it. ?I have a general sense that > one of the par() options would let you adjust the padding to 0, but I > could just be imagining that (anyone else??). Not loving it, but this is sort of what I meant... plot(0:5, seq(0, 10, 2), axes=FALSE, xaxs = "i", yaxs = "i") axis(1, at=c(0,1,2,2.5,3,4,5), pos=0) axis(2, at=seq(0, 10, 2), pos=0) abline(h = 1:5) From Greg.Snow at imail.org Tue May 3 19:52:54 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Tue, 3 May 2011 11:52:54 -0600 Subject: [R] Constructing a histogram with words as labels as height as frequency? In-Reply-To: References: Message-ID: ?barplot -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Caitlin > Sent: Tuesday, May 03, 2011 11:36 AM > To: r-help at r-project.org > Subject: [R] Constructing a histogram with words as labels as height as > frequency? > > Hi all. > > I need to construct a plot showing words on the x-axis and how many > times > each word was given as a verbal response on the y-axis as solid bar > (frequency). Is there a convenient function to do this in R? I > considered > hist(), but I'm not sure how to construct the text file. Example: > > apple, 2 > pear, 14 > house, 1 > beach, 5 > computer, 15 > > Thanks, > > ~Caitlin > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From rvaradhan at jhmi.edu Tue May 3 19:57:39 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Tue, 3 May 2011 13:57:39 -0400 Subject: [R] adaptIntegrate - how to pass additional parameters to the integrand In-Reply-To: <4DBFE431.3070904@statistik.tu-dortmund.de> References: <1304397791609-3491701.post@n4.nabble.com>, <4DBFE431.3070904@statistik.tu-dortmund.de> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F328@WIGGUMVS.win.ad.jhu.edu> Ok, I get it. require(cubature) f <- function(x, a) cos(2*pi*x*a) # a simple test function # this works a <- 0.2 adaptIntegrate(function(x, argA=a) f(x, a=argA), lower=0, upper=2) # but this doesn't work rm(a) adaptIntegrate(function(x, argA=a) f(x, a=argA), lower=0, upper=2, a=0.2) Ravi. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Uwe Ligges [ligges at statistik.tu-dortmund.de] Sent: Tuesday, May 03, 2011 7:17 AM To: HC Cc: r-help at r-project.org Subject: Re: [R] adaptIntegrate - how to pass additional parameters to the integrand On 03.05.2011 06:43, HC wrote: > Hello, > > I am trying to use adaptIntegrate function but I need to pass on a few > additional parameters to the integrand. However, this function seems not to > have the flexibility of passing on such additional parameters. > > Am I missing something or this is a known limitation. Is there a good > alternative to such restrictions, if there at all are? Looks like you are talking about the cubature package rather than about base R. Frr the latter question: Please ask the package maintainer rather than the list. Ideally send him code to implement the requested feature and the maintainer will probably add your code. Not all package maintainers read R-help. For an ad hoc solution: Just use adaptIntegrate(function(x, argA=a, argB=b) f(x, argA=argA, argB=argB), ......) in order to set additional arguments for the function call. Uwe Ligges > Many thanks for your time. > HC > > > -- > View this message in context: http://r.789695.n4.nabble.com/adaptIntegrate-how-to-pass-additional-parameters-to-the-integrand-tp3491701p3491701.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From murdoch.duncan at gmail.com Tue May 3 20:02:01 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 03 May 2011 14:02:01 -0400 Subject: [R] Controlling the extent of ablines on plot In-Reply-To: References: <1304442635.25057.32.camel@localhost> Message-ID: <4DC04319.4070608@gmail.com> On 03/05/2011 1:26 PM, Ryan Utz wrote: > Well... that could work. Problem is in the actual graphs I'm making, there > are to be>30 lines per graph (as many as 60 in some cases). Any way I could > use the lines command without having to write out 60 lines of code per > figure? That's why I like ablines; you just have to specify a single value > and it will put a horizontal line at that number. > > Thanks, > Ryan Write your own abline. For example, with your previously posted example: xlim <- c(0, 5) ylim <- c(0, 10) abline2 <- function(h, v) { if (!missing(h)) { n <- length(h) segments( rep(xlim[1], n), h, rep(xlim[2], n), h) } if (!missing(v)) { n <- length(v) segments( rep(ylim[1], n), v, rep(ylim[2], n), v) } } Just replace your abline() call with abline2(). Duncan Murdoch From biomathjdaily at gmail.com Tue May 3 20:28:11 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Tue, 3 May 2011 14:28:11 -0400 Subject: [R] Simple loop In-Reply-To: <1304437471793-3492819.post@n4.nabble.com> References: <1304437471793-3492819.post@n4.nabble.com> Message-ID: It is actually possible and preferable to do this with no loops. Assuming your data is in a dataframe called dat: idx <- with(dat, Site == 1 & Prof == 1) dat <- within(dat, { new = H - ifelse(Site == 1 & Prof == 1, min(H[idx]), min(H[!idx])) }) dat which also serves to illuminate the difference between with and within as a bonus. HTH, Jon On Tue, May 3, 2011 at 11:44 AM, Woida71 wrote: > Hello everybody, > I am beginning with loops and functions and would be glad to have help in > the following question: > If i have a dataframe like this > Site ?Prof ?H > 1 ? ? ?1 ? ? 24 > 1 ? ? ?1 ? ? 16 > 1 ? ? ?1 ? ? 67 > 1 ? ? ?2 ? ? 23 > 1 ? ? ?2 ? ? 56 > 1 ? ? ?2 ? ? 45 > 2 ? ? ?1 ? ? 67 > 2 ? ? ?1 ? ? 46 > And I would like to create a new column that subtracts the minimum of H from > H, but for S1 and P1 > only the minimum of the data points falling into this category should be > taken. > So for example the three first numbers of the new column write: 24-16, > 16-16, 67-16 > the following numbers refering to Site1 and Prof2 write: 23-23, 56-23, > 45-23. > I think with two loops one refering to the Site, the other to the Prof, it > should be possible to automatically > create the new column. > Thanks a lot for any help. > > -- > View this message in context: http://r.789695.n4.nabble.com/Simple-loop-tp3492819p3492819.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From djandrija at gmail.com Tue May 3 20:28:02 2011 From: djandrija at gmail.com (andrija djurovic) Date: Tue, 3 May 2011 20:28:02 +0200 Subject: [R] Simple loop In-Reply-To: <1304437471793-3492819.post@n4.nabble.com> References: <1304437471793-3492819.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jfox at mcmaster.ca Tue May 3 20:31:02 2011 From: jfox at mcmaster.ca (John Fox) Date: Tue, 3 May 2011 14:31:02 -0400 Subject: [R] data transformation ----Box-Cox Transformations In-Reply-To: <42f9b7cf-b628-42ba-845b-8d7153a5c337@q32g2000yqn.googlegroups.com> References: <42f9b7cf-b628-42ba-845b-8d7153a5c337@q32g2000yqn.googlegroups.com> Message-ID: <004301cc09c0$42153a40$c63faec0$@mcmaster.ca> Dear Stuart, See ?bcPower and ?powerTransform in the car package, the latter for univariate and multivariate conditional and unconditional ML Box-Cox. I hope this helps, John -------------------------------- John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Stuart > Sent: May-03-11 11:37 AM > To: r-help at r-project.org > Subject: [R] data transformation ----Box-Cox Transformations > > Hi > > Could any one please help how I can trnasform data based on Box-Cox > Transformations. I have massive data set with many variables. If > possible someone can write few lines so I can read in all data set once > and transform it. > > > g1 g2 g2 > 97.03703704 89.25925926 4.444444444 > 24.90740741 69.25925926 35.55555556 > 62.22222222 85.18518519 36.85185185 > 18.51851852 84.25925926 21.66666667 > 93.7037037 95.92592593 54.07407407 > 26.66666667 23.33333333 99.25925926 > 63.33333333 97.03703704 27.40740741 > 95.74074074 3.611111111 59.25925926 > 46.66666667 49.44444444 39.16666667 > 21.85185185 2.592592593 63.14814815 > 94.72222222 17.77777778 81.11111111 > > > any help will be much appreciated > > Cheers > Sbroad > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From vioravis at gmail.com Tue May 3 20:55:12 2011 From: vioravis at gmail.com (vioravis) Date: Tue, 3 May 2011 11:55:12 -0700 (PDT) Subject: [R] Loading a FORTRAN DLL Message-ID: <1304448912418-3493263.post@n4.nabble.com> I have a FORTRAN DLL file obtained from Compaq Visual Fortran and when I try to load the DLL into the R environment I get an error. > dyn.load("my_function.dll") "This application has failed to start because MSCVRTD.dll was not found. Re-installing this application may fix the problem." When I tried it again, the above error doesn't appear anymore. Instead, I get the following error: Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared library 'D://my_function.dll': LoadLibrary failure: The specified module could not be found. Do I need to have FORTRAN installed to be able to run the DLL file??? Can someone please help me with what is causing this error??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/Loading-a-FORTRAN-DLL-tp3493263p3493263.html Sent from the R help mailing list archive at Nabble.com. From xing.qiu at gmail.com Tue May 3 20:48:05 2011 From: xing.qiu at gmail.com (Xing Qiu) Date: Tue, 3 May 2011 14:48:05 -0400 Subject: [R] How do I break a foreach loop? Message-ID: Hi, I've noticed that the usual "break", "next" commands do not work in a foreach loop, is there a nice way to do that? A little more detail: I am using foreach to conduct a very time consuming (may take several days if done sequentially) simulation study. The number of simulations is set to 1000. So the command I am using looks like this simplified version: grandsum <- foreach(icount(1000), .combine="+") %dopar% { sim(...) } In fact, grandsum can never take a value greater than a threshold, say 10.0. So I want the number of iterations depend on the value of grandsum. Say when the grandsum is greater equal to 10.0 the computation should be terminated to save time. This is the for loop version of what is in my mind: grandsum <- 0 for (i in 1:1000) { if (grandsum >= 10.0) break else grandsum <- grandsum + sim(...) } Is there a way to re-write the above for loop by an equivalent foreach/dopar loop? Thanks very much, Xing From wdunlap at tibco.com Tue May 3 21:04:47 2011 From: wdunlap at tibco.com (William Dunlap) Date: Tue, 3 May 2011 12:04:47 -0700 Subject: [R] Simple loop In-Reply-To: References: <1304437471793-3492819.post@n4.nabble.com> Message-ID: <77EB52C6DD32BA4D87471DCD70C8D700042B94AC@NA-PA-VBE03.na.tibco.com> Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of andrija djurovic > Sent: Tuesday, May 03, 2011 11:28 AM > To: Woida71 > Cc: r-help at r-project.org > Subject: Re: [R] Simple loop > > Hi. > There is no need to do this in a for loop. > Here is one approach: > > x <- read.table(textConnection("Site Prof H > 1 1 24 > 1 1 16 > 1 1 67 > 1 2 23 > 1 2 56 > 1 2 45 > 2 1 67 > 2 1 46"), header = TRUE) > closeAllConnections() > x > cbind(x,newCol=unlist(tapply(x[,3],paste(x[,1],x[,2],sep=""), > function(x) x-min(x))) > Site Prof H newCol > 111 1 1 24 8 > 112 1 1 16 0 > 113 1 1 67 51 > 121 1 2 23 0 > 122 1 2 56 33 > 123 1 2 45 22 > 211 2 1 67 21 > 212 2 1 46 0 That works when Site and Prof are ordered as shown, but if they are not sorted cbind(...,tapply) won't line up the the new entries with the old rows properly. Try doing it on x[8:1,] to see this. ave() can deal that problem: > cbind(x, newCol2 = with(x, ave(H, Site, Prof, FUN=function(y)y-min(y)))) Site Prof H newCol2 1 1 1 24 8 2 1 1 16 0 3 1 1 67 51 4 1 2 23 0 5 1 2 56 33 6 1 2 45 22 7 2 1 67 21 8 2 1 46 0 Warning message: In min(y) : no non-missing arguments to min; returning Inf The warning is unfortunate: ave() calls FUN even for when there is no data for a particular group (Site=2, Prof=2 in this case). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > Andrija > > > On Tue, May 3, 2011 at 5:44 PM, Woida71 wrote: > > > Hello everybody, > > I am beginning with loops and functions and would be glad > to have help in > > the following question: > > If i have a dataframe like this > > Site Prof H > > 1 1 24 > > 1 1 16 > > 1 1 67 > > 1 2 23 > > 1 2 56 > > 1 2 45 > > 2 1 67 > > 2 1 46 > > And I would like to create a new column that subtracts the > minimum of H > > from > > H, but for S1 and P1 > > only the minimum of the data points falling into this > category should be > > taken. > > So for example the three first numbers of the new column > write: 24-16, > > 16-16, 67-16 > > the following numbers refering to Site1 and Prof2 write: > 23-23, 56-23, > > 45-23. > > I think with two loops one refering to the Site, the other > to the Prof, it > > should be possible to automatically > > create the new column. > > Thanks a lot for any help. > > > > -- > > View this message in context: > > http://r.789695.n4.nabble.com/Simple-loop-tp3492819p3492819.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djandrija at gmail.com Tue May 3 21:14:20 2011 From: djandrija at gmail.com (andrija djurovic) Date: Tue, 3 May 2011 21:14:20 +0200 Subject: [R] Simple loop In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D700042B94AC@NA-PA-VBE03.na.tibco.com> References: <1304437471793-3492819.post@n4.nabble.com> <77EB52C6DD32BA4D87471DCD70C8D700042B94AC@NA-PA-VBE03.na.tibco.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarah.kalicin at intel.com Tue May 3 21:18:03 2011 From: sarah.kalicin at intel.com (Kalicin, Sarah) Date: Tue, 3 May 2011 12:18:03 -0700 Subject: [R] na.omit - Is it working properly? Message-ID: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Tue May 3 21:37:23 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 3 May 2011 12:37:23 -0700 Subject: [R] Simple loop In-Reply-To: <1304437471793-3492819.post@n4.nabble.com> References: <1304437471793-3492819.post@n4.nabble.com> Message-ID: Hi: Here are two more candidates, using packages plyr and data.table. Your toy data frame is called dd below. library(plyr) ddply(dd, .(Site, Prof), transform, Hadj = H - min(H)) Site Prof H Hadj 1 1 1 24 8 2 1 1 16 0 3 1 1 67 51 4 1 2 23 0 5 1 2 56 33 6 1 2 45 22 7 2 1 67 21 8 2 1 46 0 library(data.table) dt <- data.table(dd, key = 'Site, Prof') dt[, list(H = H, Hadj = H - min(H)), by = 'Site, Prof'] HTH, Dennis On Tue, May 3, 2011 at 8:44 AM, Woida71 wrote: > Hello everybody, > I am beginning with loops and functions and would be glad to have help in > the following question: > If i have a dataframe like this > Site ?Prof ?H > 1 ? ? ?1 ? ? 24 > 1 ? ? ?1 ? ? 16 > 1 ? ? ?1 ? ? 67 > 1 ? ? ?2 ? ? 23 > 1 ? ? ?2 ? ? 56 > 1 ? ? ?2 ? ? 45 > 2 ? ? ?1 ? ? 67 > 2 ? ? ?1 ? ? 46 > And I would like to create a new column that subtracts the minimum of H from > H, but for S1 and P1 > only the minimum of the data points falling into this category should be > taken. > So for example the three first numbers of the new column write: 24-16, > 16-16, 67-16 > the following numbers refering to Site1 and Prof2 write: 23-23, 56-23, > 45-23. > I think with two loops one refering to the Site, the other to the Prof, it > should be possible to automatically > create the new column. > Thanks a lot for any help. > > -- > View this message in context: http://r.789695.n4.nabble.com/Simple-loop-tp3492819p3492819.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From goran.brostrom at gmail.com Tue May 3 21:40:26 2011 From: goran.brostrom at gmail.com (=?UTF-8?B?R8O2cmFuIEJyb3N0csO2bQ==?=) Date: Tue, 3 May 2011 21:40:26 +0200 Subject: [R] ID parameter in model In-Reply-To: <197e8ae6-a4f8-42c8-9318-8162a3cbae40@gu8g2000vbb.googlegroups.com> References: <197e8ae6-a4f8-42c8-9318-8162a3cbae40@gu8g2000vbb.googlegroups.com> Message-ID: On Mon, May 2, 2011 at 5:38 PM, Mike Harwood wrote: > Hello, > > I am apparently confused about the use of an id parameter for an event > history/survival model, and why the EHA documentation for aftreg does > not specify one. ?All assistance and insights are appreciated. Which version of eha are you using? The latest version documents the use of 'id'. > Attempting to specifiy an id variable with the documentation example > generates an "overlapping intervals" error, Thanks for pointing this out. It is an error (actually three) in the data frame. The reason is that it is "real" data; and not sufficiently checked by me. I'll fix this in an update soon. Thanks again, G?ran so I sorted the original > mort dataframe and set subsequent entry times an id to the previous > exit time + 0.0001. ?This allowed me to see the affect of the id > parameter on the coefficients and significance tests, and prompted my > question. ?The code I used is shown below, with the results at the > bottom. ?Thanks in advance! > > Mike > > head(mort) ## data clearly contains multiple entries for some of the > dataframe ids > > no.id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort) ?## > Inital model > id.aft <- aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id) > ## overlapping intervals error > > mort.sort <- ## ensure records ordered > ? ?mort[ > ? ? ? ?order(mort$id, mort$enter),] > > ## remove overlap > for (i in 2:nrow(mort.sort)){ > ? ? if (mort.sort[i,'id'] == mort.sort[i-1,'id']) > ? ? ? ? mort.sort[i,'enter'] <- mort.sort[i-1, 'exit'] + 0.0001 > ? ? ? ?} > > no.id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, data = > mort.sort) ## initial model on modified df > id.aft.sort <- aftreg(Surv(enter, exit, event) ~ ses, id=id, data = > mort.sort) ## with id parameter > > > #=== output ===========# >> no.id.aft.sort > Call: > aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort) > > Covariate ? ? ? ? ?W.mean ? ? ?Coef Exp(Coef) ?se(Coef) ? ?Wald p > ses > ? ? ? ? ? lower ? ?0.416 ? ? 0 ? ? ? ? 1 ? ? ? ? ? (reference) > ? ? ? ? ? upper ? ?0.584 ? ?-0.347 ? ? 0.707 ? ? 0.089 ? ? 0.000 > > log(scale) ? ? ? ? ? ? ? ? ? ?3.603 ? ?36.704 ? ? 0.065 ? ? 0.000 > log(shape) ? ? ? ? ? ? ? ? ? ?0.331 ? ? 1.393 ? ? 0.058 ? ? 0.000 > > Events ? ? ? ? ? ? ? ? ? ?276 > Total time at risk ? ? ? ? 17045 > Max. log. likelihood ? ? ?-1391.4 > LR test statistic ? ? ? ? 16.1 > Degrees of freedom ? ? ? ?1 > Overall p-value ? ? ? ? ? 6.04394e-05 >> id.aft.sort > Call: > aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort, > ? ?id = id) > > Covariate ? ? ? ? ?W.mean ? ? ?Coef Exp(Coef) ?se(Coef) ? ?Wald p > ses > ? ? ? ? ? lower ? ?0.416 ? ? 0 ? ? ? ? 1 ? ? ? ? ? (reference) > ? ? ? ? ? upper ? ?0.584 ? ?-0.364 ? ? 0.695 ? ? 0.090 ? ? 0.000 > > log(scale) ? ? ? ? ? ? ? ? ? ?3.588 ? ?36.171 ? ? 0.065 ? ? 0.000 > log(shape) ? ? ? ? ? ? ? ? ? ?0.338 ? ? 1.402 ? ? 0.058 ? ? 0.000 > > Events ? ? ? ? ? ? ? ? ? ?276 > Total time at risk ? ? ? ? 17045 > Max. log. likelihood ? ? ?-1390.8 > LR test statistic ? ? ? ? 17.2 > Degrees of freedom ? ? ? ?1 > Overall p-value ? ? ? ? ? 3.3091e-05 >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- G?ran Brostr?m From pdalgd at gmail.com Tue May 3 21:50:04 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 3 May 2011 21:50:04 +0200 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: Message-ID: On Apr 28, 2011, at 15:18 , JP wrote: > > > I have found that when doing a wilcoxon signed ranked test you should report: > > - The median value (and not the mean or sd, presumably because of the > underlying potential non normal distribution) > - The Z score (or value) > - r > - p value > ...printed on 40g/m^2 acid free paper with a pencil of 3B softness? Seriously, with nonparametrics, the p value is the only thing of real interest, the other stuff is just attempting to check on authors doing their calculations properly. The median difference is of some interest, but it is not actually what is being tested, and in heavily tied data, it could even be zero with a highly significant p-value. The Z score can in principle be extracted from the p value (qnorm(p/2), basically) but it's obviously unstable in the extreme cases. What is r? The correlation? Pearson, not Spearman? > My questions are: > > - Are the above enough/correct values to report (some places even > quote W and df) ? df is silly, and/or blatantly wrong... > What else would you suggest? > - How do I calculate the Z score and r for the above example? > - How do I get each statistic from the pairwise.wilcox.test call? > > Many Thanks > JP > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From clanders at utmb.edu Tue May 3 21:15:05 2011 From: clanders at utmb.edu (algorimancer) Date: Tue, 3 May 2011 12:15:05 -0700 (PDT) Subject: [R] Unexp. behavior from boot with multiple statistics Message-ID: <1304450105554-3493300.post@n4.nabble.com> I am attempting to use package boot to summarize and compare the performance of three models. I'm using R 2.13.0 in a Win32 environment. My statistic function returns a vector of 6 values, 3 of which are error rates for different models, and 3 are pairwise differences between those error rates. It looks like: multiEst<-function(dat,i) { .... c(E1,E2,E3,E2-E1,E3-E1,E3-E2); } then I call boot (using R=4 for simplicity of description) with: multiBoot=boot(data,multiEst,R=4) which gives reasonable results: Bootstrap Statistics : original bias std. error t1* 0.07 0.3775 0.04193249 t2* 0.08 0.3750 0.04654747 t3* 0.04 0.4200 0.05354126 t4* 0.01 -0.0025 0.00500000 t5* -0.03 0.0425 0.01500000 t6* -0.04 0.0450 0.01290994 and the resulting "t0" contains the expected estimates of the statistics, > multiBoot$t0 [1] 0.07 0.08 0.04 0.01 -0.03 -0.04 however "t", which is supposed to contain bootstrap replicates of the statistic, doesn't. It looks like this: > multiBoot$t [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.46 0.47 0.46 0.01 0.00 -0.01 [2,] 0.39 0.39 0.39 0.00 0.00 0.00 [3,] 0.45 0.46 0.47 0.01 0.02 0.01 [4,] 0.49 0.50 0.52 0.01 0.03 0.02 It is not clear where these columns come from --- they clearly do not resemble the estimates in "t0". If I define a separate statistic function for each desired estimate, the resulting "t" and "t0" are as expected, however it is important in this case that the separate estimates derive from the same bootstrap replicates. Any helpful suggestions? Or have I come upon a bug in the implementation? Note: the documentation provides the following definitions for these returned variables: t0 The observed value of statistic applied to data. t A matrix with R rows each of which is a bootstrap replicate of statistic. -- View this message in context: http://r.789695.n4.nabble.com/Unexp-behavior-from-boot-with-multiple-statistics-tp3493300p3493300.html Sent from the R help mailing list archive at Nabble.com. From david.j.meehan at gmail.com Tue May 3 21:37:31 2011 From: david.j.meehan at gmail.com (Rovinpiper) Date: Tue, 3 May 2011 12:37:31 -0700 (PDT) Subject: [R] ANOVA 1 too few degrees of freedom Message-ID: <1304451451151-3493349.post@n4.nabble.com> I'm running an ANOVA on some data for respiration in a forest. I am having a problem with my degrees of freedom. For one of my variables I get one fewer degrees of freedom than I should. I have 12 plots and I therefore expected 11 degrees of freedom, but instead I got 10. Any ideas? I have some code and output below: > class(Combined.Plot) [1] "character" > levels(as.factor(Combined.Plot)) [1] "60m" "A1" "B1" "B3" "B4" "C5" "C9" "D2" "D9" "F60m" "F8" "Q7" > nlevels(as.factor(Combined.Plot)) [1] 12 > Anova.Trt.D.M.T.Pr.Model <- aov(Combined.Rs~Combined.Trt + > as.factor(Combined.Plot) + as.factor(Combined.Day) + > Combined.Trt*as.factor(Combined.Day) + > Combined.Plot*as.factor(Combined.Day)) Warning message: In model.matrix.default(mt, mf, contrasts) : variable 'Combined.Plot' converted to a factor > summary(Anova.Trt.D.M.T.Pr.Model) Df Sum Sq Mean Sq F value Pr(>F) Combined.Trt 1 52.80 52.805 2.0186e+30 < 2.2e-16 *** as.factor(Combined.Plot) 10 677.69 67.769 2.5907e+30 < 2.2e-16 *** as.factor(Combined.Day) 16 2817.47 176.092 6.7317e+30 < 2.2e-16 *** Combined.Trt:as.factor(Combined.Day) 16 47.82 2.989 1.1426e+29 < 2.2e-16 *** as.factor(Combined.Day):Combined.Plot 160 611.21 3.820 1.4604e+29 < 2.2e-16 *** Residuals 204 0.00 0.000 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-1-too-few-degrees-of-freedom-tp3493349p3493349.html Sent from the R help mailing list archive at Nabble.com. From ghubona at gmail.com Tue May 3 21:07:15 2011 From: ghubona at gmail.com (Geoffrey Hubona) Date: Tue, 3 May 2011 15:07:15 -0400 Subject: [R] NEW SUMMER ONLINE R COURSE: Fundamentals of Using R Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rmh at temple.edu Tue May 3 22:19:52 2011 From: rmh at temple.edu (Richard M. Heiberger) Date: Tue, 3 May 2011 16:19:52 -0400 Subject: [R] ANOVA 1 too few degrees of freedom In-Reply-To: <1304451451151-3493349.post@n4.nabble.com> References: <1304451451151-3493349.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mmuurr at gmail.com Tue May 3 22:20:57 2011 From: mmuurr at gmail.com (Murat Tasan) Date: Tue, 3 May 2011 13:20:57 -0700 (PDT) Subject: [R] bootstrap vignette construction and package installation Message-ID: <4a731e6e-fa8e-4339-8311-5640a7ec99cf@z13g2000prk.googlegroups.com> hi all - i'm trying to 'R CMD build' a package, but i have what appears to be a bootstrapping problem: i've included a vignette in my package, with R code interwoven (and built using Sweave), but in this documentation i have a code line: > library(MyPackage) now, when trying to build a .tar.gz install-able version of my package in a clean state, i remove the original compiled/installed package, and the build fails because Sweave wants to execute the interwoven R code even though the package hasn't yet been installed. i got around it by temporarily just hiding the the inst/doc/ directory, installing the now-able-to-build package, then re-building the package after the installation. but this is a pretty clunky solution, and i imagine there's a more elegant way to get around this bootstrapping problem that i'm just not aware of. any thoughts? thanks much for any help, -murat From stevenkennedy2263 at gmail.com Tue May 3 22:25:58 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Wed, 4 May 2011 06:25:58 +1000 Subject: [R] How to fit a random data into Beta distribution? In-Reply-To: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com> References: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com> Message-ID: library(MASS) fitdistr(x,"beta",list(shape1=1,shape2=1)) On Tue, May 3, 2011 at 9:44 PM, Shekhar wrote: > > Hi, > I have some random data and i want to find out the parameters of Beta > distribution ( a and b) such that this data approximately fits into > this distribution. I have tried by plot the histograms and graph, but > it requires lot of tuning and i am unable to do that. can anyone tell > me how to do it programmitically in R? > > Regards, > Som Shekhar > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jun.shen.ut at gmail.com Tue May 3 22:43:23 2011 From: jun.shen.ut at gmail.com (Jun Shen) Date: Tue, 3 May 2011 15:43:23 -0500 Subject: [R] Change the names of a dataframe Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dimitri.liakhovitski at gmail.com Tue May 3 22:49:46 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Tue, 3 May 2011 16:49:46 -0400 Subject: [R] turning data with start and end date into daily data Message-ID: Hello! I have data that contain, among other things the date for the beginning and for the end of a (daily) time series (see example below - "mydata") mystring1<-c("String 1", "String 2") mystring2<-c("String a", "String b") starts<-c(as.Date("2011-02-01"),as.Date("2011-03-02")) ends<-c(as.Date("2011-03-15"),as.Date("2011-03-31")) values<-c(2000,10000) mydata<-data.frame(starts=starts,ends=ends,values=values,mystring1=mystring1,mystring2=mystring2) (mydata) I have to reshape it so that: for each row of "mydata" I have daily time series that start on the start date and end on the end date; what used to be in the column "values" has to be distributed equally across those dates; all other columns keep their original values. My code below does it (see the end result "newdata"). However, to achieve my goal, I am looping through rows of "mydata" - I am not sure it will work with my real data set that already has thousands of rows and also a lot of other columns with strings. I am afraid I'll run out of memory. Is there maybe a way of doing it more efficiently? Thanks a lot for your pointers! newdata<-data.frame(mydate=NA,myvalues=NA,mystring1=NA,mystring2=NA) for(i in 1:nrow(mydata)){ # i<-2 start.date = mydata[i,"starts"] end.date = mydata[i,"ends"] all.dates = seq(start.date, length = end.date - start.date, by = "day") temp.df <- data.frame(mydate = all.dates) temp.df$myvalues = mydata[i,"values"]/length(all.dates) temp.df[names(mydata)[4:5]] = mydata[i,4:5] newdata<-rbind(newdata,temp.df) } newdata<-newdata[-1,] (newdata);(mydata) -- Dimitri Liakhovitski Ninah Consulting www.ninah.com From Greg.Snow at imail.org Tue May 3 22:51:01 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Tue, 3 May 2011 14:51:01 -0600 Subject: [R] Change the names of a dataframe In-Reply-To: References: Message-ID: In the first case you create a new data frame consisting of the 2nd column of the original, then change the name of the only column in that new data frame, then since nothing is done with that data frame it gets thrown away. So it is not that nothing happened, but just that nothing useful happened. The original data frame was never changed, just a copy of it. The second method access the names, then changes the 2nd name and therefore changes the actual data frame of interest. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Jun Shen > Sent: Tuesday, May 03, 2011 2:43 PM > To: R-help > Subject: [R] Change the names of a dataframe > > Dear list, > > This may sound silly. What is the right way to change the names of a > dataframe? Let's say I have this data frame (dose) with four columns > with > names "ID", "DOSE", "TIME" "CMT". I want to change "DOSE" to "AMT". So > I did > > names(dose[2])<-'AMT' > > But nothing happened. The name of the second column is still "DOSE". > > Only this works > > names(dose)[2]<-'AMT' > > I wonder what is wrong with the first method. Thanks. > > Jun > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From rohitpandey576 at gmail.com Tue May 3 22:53:29 2011 From: rohitpandey576 at gmail.com (Rohit Pandey) Date: Wed, 4 May 2011 02:23:29 +0530 Subject: [R] help with the maxBHHH routine Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From baptiste.auguie at googlemail.com Tue May 3 22:54:24 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Wed, 4 May 2011 08:54:24 +1200 Subject: [R] adaptIntegrate - how to pass additional parameters to the integrand In-Reply-To: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F328@WIGGUMVS.win.ad.jhu.edu> References: <1304397791609-3491701.post@n4.nabble.com> <4DBFE431.3070904@statistik.tu-dortmund.de> <79F23BA7BB084E4FA01A8B93904CD02CF669E9F328@WIGGUMVS.win.ad.jhu.edu> Message-ID: Hi, The package maintainer is aware of this feature request. In the meantime, I've used Currying, require(cubature) f <- function(x, a) cos(2*pi*x*a) # a simple test function adaptIntegrate(roxygen::Curry(f, a=0.2), lower=0, upper=2) HTH, baptiste On 4 May 2011 05:57, Ravi Varadhan wrote: > Ok, I get it. > > require(cubature) > > f <- function(x, a) cos(2*pi*x*a) ?# a simple test function > > # this works > a <- 0.2 > adaptIntegrate(function(x, argA=a) f(x, a=argA), lower=0, upper=2) > > # but this doesn't work > rm(a) > adaptIntegrate(function(x, argA=a) f(x, a=argA), lower=0, upper=2, a=0.2) > > Ravi. > > ________________________________________ > From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Uwe Ligges [ligges at statistik.tu-dortmund.de] > Sent: Tuesday, May 03, 2011 7:17 AM > To: HC > Cc: r-help at r-project.org > Subject: Re: [R] adaptIntegrate - how to pass additional parameters to the integrand > > On 03.05.2011 06:43, HC wrote: >> Hello, >> >> I am trying to use adaptIntegrate function but I need to pass on a few >> additional parameters to the integrand. However, this function seems not to >> have the flexibility of passing on such additional parameters. >> >> Am I missing something or this is a known limitation. Is there a good >> alternative to such restrictions, if there at all are? > > > Looks like you are talking about the cubature package rather than about > base R. Frr the latter question: Please ask the package maintainer > rather than the list. Ideally send him code to implement the requested > feature and the maintainer will probably add your code. Not all package > maintainers read R-help. > > For an ad hoc solution: > > Just use > > adaptIntegrate(function(x, argA=a, argB=b) f(x, argA=argA, argB=argB), > ......) > > in order to set additional arguments for the function call. > > Uwe Ligges > > > > > > > > > > >> Many thanks for your time. >> HC >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/adaptIntegrate-how-to-pass-additional-parameters-to-the-integrand-tp3491701p3491701.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From murdoch.duncan at gmail.com Tue May 3 22:54:28 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 3 May 2011 16:54:28 -0400 Subject: [R] Change the names of a dataframe In-Reply-To: References: Message-ID: <4DC06B84.6010608@gmail.com> On 03/05/2011 4:43 PM, Jun Shen wrote: > Dear list, > > This may sound silly. What is the right way to change the names of a > dataframe? Let's say I have this data frame (dose) with four columns with > names "ID", "DOSE", "TIME" "CMT". I want to change "DOSE" to "AMT". So I did > > names(dose[2])<-'AMT' > > But nothing happened. The name of the second column is still "DOSE". > > Only this works > > names(dose)[2]<-'AMT' > > I wonder what is wrong with the first method. Thanks. > The first method says: 1. Get dose[2]. This is a dataframe consisting of the second column of dose. Store it in a temporary, unnamed variable. 2. Changes its names to 'AMT'. This changes the names of the temporary variable, which then disappears. The second method says: 1. Set the second element of the names associated with dose to 'AMT'. That's what you want to do. Duncan Murdoch From Andrew.McFadden at maf.govt.nz Tue May 3 23:05:23 2011 From: Andrew.McFadden at maf.govt.nz (Andrew McFadden) Date: Wed, 4 May 2011 09:05:23 +1200 Subject: [R] Overlapping x axes using Lattice Message-ID: <787543A672A2C543B8281EEADC2FE9017A7940@WDCWMSP386.network.maf.govt.nz> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From david.j.meehan at gmail.com Tue May 3 23:33:12 2011 From: david.j.meehan at gmail.com (Rovinpiper) Date: Tue, 3 May 2011 14:33:12 -0700 (PDT) Subject: [R] ANOVA 1 too few degrees of freedom In-Reply-To: References: <1304451451151-3493349.post@n4.nabble.com> Message-ID: <1304458392408-3493632.post@n4.nabble.com> Hi Richard, Thanks for your advice. I think that your suggestion is that I run the ANOVA with Combined.Plot as a factor. I have tried that does not alleviate the problem. Did I understand you properly? Do you have another idea? Thanks, David -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-1-too-few-degrees-of-freedom-tp3493349p3493632.html Sent from the R help mailing list archive at Nabble.com. From patrick.breheny at uky.edu Tue May 3 23:44:39 2011 From: patrick.breheny at uky.edu (Breheny, Patrick) Date: Tue, 3 May 2011 17:44:39 -0400 Subject: [R] Overlapping x axes using Lattice In-Reply-To: <787543A672A2C543B8281EEADC2FE9017A7940@WDCWMSP386.network.maf.govt.nz> References: <787543A672A2C543B8281EEADC2FE9017A7940@WDCWMSP386.network.maf.govt.nz> Message-ID: <408338F86F0D4243BD5E7B74A8C0862B20569E206B@EX7FM03.ad.uky.edu> I'm not clear on what you're looking for here. Your x-axis is numeric, why are you converting it to a factor? If you keep it numeric, the labels don't overlap. Or perhaps you don't want it to be numeric, in which case why not just change the aspect ratio of the plot until they no longer overlap? Or change the axis font size? Or put fewer labels on the axis? There is a finite amount of space on the axis, so you have to either make more space, shrink the labels, or draw fewer of them. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Andrew McFadden Sent: Tuesday, May 03, 2011 5:05 PM To: r-help at r-project.org Subject: [R] Overlapping x axes using Lattice Hi R users I apologise in advance for this question as I suspect it is simple and perhaps others have had this problem. I am struggling to sort out how to fix the x axes so that the labels don't overlap. I have put the following example together to show my problem. library(lattice) titre <- as.factor(rep(c(10999,20999,30999,40999,50999,60999, 20000,40000,45000,50000,70000,80000),c(2,2,2,2,1,1,2,2,2,2,1,1))) test<-rep(c("BVD","HSD"),c(10,10)) age<-as.factor(rep(c(1,2,3,4),c(5,5,5,5))) pesti<-data.frame(titre,test,age) histogram(~pesti[,1]|pesti[,2]+ pesti[,3] ,alternating=TRUE,tick.number=1, stack=TRUE,type = "count", xlab="VNT",rot=c(180,180),draw=FALSE) Thank you in advance. Andy Andrew McFadden MVS BVSc Incursion Investigator Investigation & Diagnostic Centres - Wallaceville Biosecurity New Zealand Ministry of Agriculture and Forestry Phone 04 894 5600 Fax 04 894 4973 Mobile 029 894 5611 Postal address: Investigation and Diagnostic Centre- Wallaceville Box 40742 Ward St Upper Hutt This email message and any attachment(s) is intended solely for the addressee(s) named above. The information it contains is confidential and may be legally privileged. Unauthorised use of the message, or the information it contains, may be unlawful. If you have received this message by mistake please call the sender immediately on 64 4 8940100 or notify us by return email and erase the original message and attachments. Thank you. The Ministry of Agriculture and Forestry accepts no responsibility for changes made to this email or to any attachments after transmission from the office. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From benrhelp at yahoo.co.uk Wed May 4 00:23:22 2011 From: benrhelp at yahoo.co.uk (Ben Rhelp) Date: Tue, 3 May 2011 23:23:22 +0100 (BST) Subject: [R] Compiling Rgraphiz on Windows 7 64bit with R-2.13.0 Message-ID: <588572.79565.qm@web29207.mail.ird.yahoo.com> Hi all, I am trying to compile Rgraphiz on Windows 7 64bit with R-2.13.0. I have installed Rtools213.exe from [1]. The 64bit packages in [2] provided me with the 64 bit version of graphviz. After intalling the binary version Rgraphviz 1.30 (in 32bit) it complains (as expected) that: > library(Rgraphviz) Error: package 'Rgraphviz' is not installed for 'arch=x64' I don't understand why the 64 bit version of graphiz is provided but not one for Rgraphviz. Have I missed it somewhere? In any case, it is suggested to build it from source, so I tried following the steps of the README from the source package of Rgraphviz (see below). I have the same error than in [3]. Does anyone know what is going on or if Kasper found a solution back in 2009? thanks in advance, Cheers, Ben C:\BenSave>R --arch x64 CMD build --binary .\Rgraphviz --binary is deprecated * checking for file '.\Rgraphviz/DESCRIPTION' ... OK * preparing 'Rgraphviz': * checking DESCRIPTION meta-information ... OK * cleaning src * installing the package to re-build vignettes ----------------------------------- * installing *source* package 'Rgraphviz' ... Using the following environment variables GRAPHVIZ_INSTALL_DIR=C:\/BenSave\/GoodiesWin64\/graphviz GRAPHVIZ_INSTALL_MAJOR=2 GRAPHVIZ_INSTALL_MINOR=20 GRAPHVIZ_INSTALL_SUBMINOR=3 Using the following compilation and linking flags for Rgraphviz PKG_CPPFLAGS=-IC:\/BenSave\/GoodiesWin64\/graphviz/include/graphviz PKG_LIBS=-LC:\/BenSave\/GoodiesWin64\/graphviz/bin -lgvc-4 -lgraph-4 -lcdt-4 GVIZ_DEFS=-DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 Created file src/Makevars.win Created file R/graphviz_build_version.R ** libs cygwin warning: MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/x64/Makeconf Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/x64/Makeco nf CYGWIN environment variable option "nodosfilewarning" turns off this warning. Consult the user's guide for more details about POSIX paths: http://cygwin.com/cygwin-ug-net/using.html#using-pathnames x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c LL_funcs.c -o LL_funcs.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c Rgraphviz.c -o Rgraphviz.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c RgraphvizInit.c -o RgraphvizInit.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c agopen.c -o agopen.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c agread.c -o agread.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c agwrite.c -o agwrite.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c bezier.c -o bezier.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c buildEdgeList.c -o buildEdgeList.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c buildNodeList.c -o buildNodeList.o x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 -O2 -Wall -std=gnu99 -c doLayout.c -o doLayout.o doLayout.c: In function 'getEdgeLocs': doLayout.c:131:17: error: 'textlabel_t' has no member named 'p' doLayout.c:132:17: error: 'textlabel_t' has no member named 'p' doLayout.c: In function 'getNodeLayouts': doLayout.c:243:13: error: 'textlabel_t' has no member named 'p' doLayout.c:244:13: error: 'textlabel_t' has no member named 'p' make: *** [doLayout.o] Error 1 ERROR: compilation failed for package 'Rgraphviz' * removing 'C:/Users/BVINSO~1/AppData/Local/Temp/Rtmpz6M19V/Rinst76da24d2/Rgraph viz' ----------------------------------- ERROR: package installation failed [1] http://www.murdoch-sutherland.com/Rtools/ [2] http://www.stats.ox.ac.uk/pub/Rtools/goodies/Win64No_/ [3] https://stat.ethz.ch/pipermail/bioconductor/2009-March/026585.html From gleynes+r at gmail.com Wed May 4 00:24:12 2011 From: gleynes+r at gmail.com (Gene Leynes) Date: Tue, 3 May 2011 17:24:12 -0500 Subject: [R] having trouble with "R CMD INSTALL" In-Reply-To: <4DAE9E40.2000108@statistik.tu-dortmund.de> References: <4DAE9E40.2000108@statistik.tu-dortmund.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From A.Robinson at ms.unimelb.edu.au Wed May 4 01:11:54 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 09:11:54 +1000 Subject: [R] Unexp. behavior from boot with multiple statistics In-Reply-To: <1304450105554-3493300.post@n4.nabble.com> References: <1304450105554-3493300.post@n4.nabble.com> Message-ID: <20110503231154.GP48756@ms.unimelb.edu.au> Your interpretation of what the output is supposed to look like is actually correct. Take a look at the estimates of the bias in the BootStrap Statistics. You will see that they are the same as the difference between the location of colMeans of t and t0. I hope that this helps, Andrew On Tue, May 03, 2011 at 12:15:05PM -0700, algorimancer wrote: > I am attempting to use package boot to summarize and compare the performance > of three models. I'm using R 2.13.0 in a Win32 environment. > > My statistic function returns a vector of 6 values, 3 of which are error > rates for different models, and 3 are pairwise differences between those > error rates. It looks like: > > multiEst<-function(dat,i) > { > .... > c(E1,E2,E3,E2-E1,E3-E1,E3-E2); > } > > then I call boot (using R=4 for simplicity of description) with: > > multiBoot=boot(data,multiEst,R=4) > > which gives reasonable results: > > Bootstrap Statistics : > original bias std. error > t1* 0.07 0.3775 0.04193249 > t2* 0.08 0.3750 0.04654747 > t3* 0.04 0.4200 0.05354126 > t4* 0.01 -0.0025 0.00500000 > t5* -0.03 0.0425 0.01500000 > t6* -0.04 0.0450 0.01290994 > > and the resulting "t0" contains the expected estimates of the statistics, > > multiBoot$t0 > [1] 0.07 0.08 0.04 0.01 -0.03 -0.04 > > however "t", which is supposed to contain bootstrap replicates of the > statistic, doesn't. It looks like this: > > multiBoot$t > [,1] [,2] [,3] [,4] [,5] [,6] > [1,] 0.46 0.47 0.46 0.01 0.00 -0.01 > [2,] 0.39 0.39 0.39 0.00 0.00 0.00 > [3,] 0.45 0.46 0.47 0.01 0.02 0.01 > [4,] 0.49 0.50 0.52 0.01 0.03 0.02 > > It is not clear where these columns come from --- they clearly do not > resemble the estimates in "t0". > > If I define a separate statistic function for each desired estimate, the > resulting "t" and "t0" are as expected, however it is important in this case > that the separate estimates derive from the same bootstrap replicates. > > Any helpful suggestions? Or have I come upon a bug in the implementation? > > Note: the documentation provides the following definitions for these > returned variables: > > t0 The observed value of statistic applied to data. > t A matrix with R rows each of which is a bootstrap replicate of statistic. > > > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Unexp-behavior-from-boot-with-multiple-statistics-tp3493300p3493300.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Wed May 4 01:14:08 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 09:14:08 +1000 Subject: [R] help with the maxBHHH routine In-Reply-To: References: Message-ID: <20110503231408.GQ48756@ms.unimelb.edu.au> I suggest that you provide some commented, minimal, self-contained, reproducible code. Cheers Andrew On Wed, May 04, 2011 at 02:23:29AM +0530, Rohit Pandey wrote: > Hello R community, > > I have been using R's inbuilt maximum likelihood functions, for the > different methods (NR, BFGS, etc). > > I have figured out how to use all of them except the maxBHHH function. This > one is different from the others as it requires an observation level > gradient. > > I am using the following syntax: > > maxBHHH(logLik,grad=nuGradient,finalHessian="BHHH",start=prm,iterlim=2) > > where logLik is the likelihood function and returns a vector of observation > level likelihoods and nuGradient is a function that returns a matrix with > each row corresponding to a single observation and the columns corresponding > to the gradient values for each parameter (as is mentioned in the online > help). > > however, this gives me the following error: > > *Error in checkBhhhGrad(g = gr, theta = theta, analytic = (!is.null(attr(f, > : > the matrix returned by the gradient function (argument 'grad') must have > at least as many rows as the number of parameters (10), where each row must > correspond to the gradients of the log-likelihood function of an individual > (independent) observation: > currently, there are (is) 10 parameter(s) but the gradient matrix has only > 2 row(s) > * > It seems it is expecting as many rows as there are parameters. So, I changed > my likelihood function so that it would return the transpose of the earlier > matrix (hence returning a matrix with rows equaling parameters and columns, > observations). > > However, when I run the function again, I still get an error: > *Error in gr[, fixed] <- NA : (subscript) logical subscript too long* > > I have verified that my gradient function, when summed across observations > gives the same results as the in built numerical gradient (to the 11th > decimal place - after that, they differ since R's function is numerical). > > I am trying to run a very large estimation (1000's of observations and 821 > parameters) and all of the other methods are taking way too much time > (days). This method is our last hope and so, any help will be greatly > appreciated. > > -- > Thanks in advance, > Rohit > Mob: 91 9819926213 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Wed May 4 01:20:07 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 09:20:07 +1000 Subject: [R] delete excel id automatically generated In-Reply-To: <1304418577180-3492147.post@n4.nabble.com> References: <1304418577180-3492147.post@n4.nabble.com> Message-ID: <20110503232007.GR48756@ms.unimelb.edu.au> Try the function rownames() Andrew On Tue, May 03, 2011 at 03:29:37AM -0700, agent dunham wrote: > Dear community, > > I uploaded an excel with read.xls. My xls file actually have a column which > is an id, ("plot" is the id) : > > plot height area > 34 7.6 5.4 > 85 3.2 4.1 > 89 5.4 8.4 > 121 6.7 6.2 > ... > 1325 2.1 1.5 > > However R uses another id, this way: > > r id plot height area > 1 34 7.6 5.4 > 2 85 3.2 4.1 > 3 89 5.4 8.4 > 4 121 6.7 6.2 > ... > 314 1325 2.1 1.5 > > I'd like that R used "plot" id because I delete some rows while studying > regression, and R seems to be using the first id 1,2,3,4,...,314. Sometimes > it's a mess to understand what R means in the plots when, for instance, > states that data 200 is influential > > Thanks in advance, user at host.com > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/delete-excel-id-automatically-generated-tp3492147p3492147.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Wed May 4 01:30:25 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 09:30:25 +1000 Subject: [R] na.omit - Is it working properly? In-Reply-To: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> References: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> Message-ID: <20110503233025.GS48756@ms.unimelb.edu.au> Hi Sarah, I'm not sure that I understand your problem. You have shown us three ways to try to omit missing values, and one of them seems to work. But you're concerned because some aspect of it doesn't match the ones that don't work? But they don't work! I wonder if you could send an example in commented, minimal, self-contained, reproducible code ... Cheers Andrew On Tue, May 03, 2011 at 12:18:03PM -0700, Kalicin, Sarah wrote: > > I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah > > With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R. > P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row. > > P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NA > > Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above. > > na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From kelsey.soup at gmail.com Wed May 4 00:47:57 2011 From: kelsey.soup at gmail.com (Kelsey Ketcheson) Date: Tue, 3 May 2011 15:47:57 -0700 Subject: [R] error term for ANOVA of generalized randomized block design Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mtmorgan at fhcrc.org Wed May 4 02:01:35 2011 From: mtmorgan at fhcrc.org (Martin Morgan) Date: Tue, 03 May 2011 17:01:35 -0700 Subject: [R] Compiling Rgraphiz on Windows 7 64bit with R-2.13.0 In-Reply-To: <588572.79565.qm@web29207.mail.ird.yahoo.com> References: <588572.79565.qm@web29207.mail.ird.yahoo.com> Message-ID: <4DC0975F.7060700@fhcrc.org> On 05/03/2011 03:23 PM, Ben Rhelp wrote: > Hi all, > > I am trying to compile Rgraphiz on Windows 7 64bit with R-2.13.0. I have > installed > > Rtools213.exe from [1]. The 64bit packages in [2] provided me with the 64 bit > version > of graphviz. After intalling the binary version Rgraphviz 1.30 (in 32bit) it > complains (as > > expected) that: >> library(Rgraphviz) > Error: package 'Rgraphviz' is not installed for 'arch=x64' > > I don't understand why the 64 bit version of graphiz is provided but not one for > Rgraphviz. > Have I missed it somewhere? In any case, it is suggested to build it from > source, so I tried > following the steps of the README from the source package of Rgraphviz (see > below). I have the > same error than in [3]. Does anyone know what is going on or if Kasper found a > solution back > > in 2009? > > thanks in advance, > > Cheers, > > Ben > > > C:\BenSave>R --arch x64 CMD build --binary .\Rgraphviz > --binary is deprecated > * checking for file '.\Rgraphviz/DESCRIPTION' ... OK > * preparing 'Rgraphviz': > * checking DESCRIPTION meta-information ... OK > * cleaning src > * installing the package to re-build vignettes > ----------------------------------- > * installing *source* package 'Rgraphviz' ... > Using the following environment variables > GRAPHVIZ_INSTALL_DIR=C:\/BenSave\/GoodiesWin64\/graphviz > GRAPHVIZ_INSTALL_MAJOR=2 > GRAPHVIZ_INSTALL_MINOR=20 > GRAPHVIZ_INSTALL_SUBMINOR=3 These should be set to match the version of the graphviz library you're using, MINOR=25 SUBMINOR=20090912.0445 > Using the following compilation and linking flags for Rgraphviz > PKG_CPPFLAGS=-IC:\/BenSave\/GoodiesWin64\/graphviz/include/graphviz > PKG_LIBS=-LC:\/BenSave\/GoodiesWin64\/graphviz/bin -lgvc-4 -lgraph-4 -lcdt-4 Unfortunately, these will now be incorrect; edit Rgraphviz/configure.win so that the line that includes test ${GRAPHVIZ_INSTALL_MINOR} -eq "21" reads test ${GRAPHVIZ_INSTALL_MINOR} -ge "21" Martin > GVIZ_DEFS=-DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > Created file src/Makevars.win > Created file R/graphviz_build_version.R > ** libs > cygwin warning: > MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/x64/Makeconf > Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/x64/Makeco > nf > CYGWIN environment variable option "nodosfilewarning" turns off this warning. > Consult the user's guide for more details about POSIX paths: > http://cygwin.com/cygwin-ug-net/using.html#using-pathnames > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c LL_funcs.c -o LL_funcs.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c Rgraphviz.c -o Rgraphviz.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c RgraphvizInit.c -o RgraphvizInit.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c agopen.c -o agopen.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c agread.c -o agread.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c agwrite.c -o agwrite.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c bezier.c -o bezier.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c buildEdgeList.c -o buildEdgeList.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c buildNodeList.c -o buildNodeList.o > x86_64-w64-mingw32-gcc -I"C:/PROGRA~1/R/R-213~1.0/include" -IC:/BenSave/GoodiesW > in64/graphviz/include/graphviz -DGRAPHVIZ_MAJOR=2 -DGRAPHVIZ_MINOR=20 -DWin32 > -O2 -Wall -std=gnu99 -c doLayout.c -o doLayout.o > doLayout.c: In function 'getEdgeLocs': > doLayout.c:131:17: error: 'textlabel_t' has no member named 'p' > doLayout.c:132:17: error: 'textlabel_t' has no member named 'p' > doLayout.c: In function 'getNodeLayouts': > doLayout.c:243:13: error: 'textlabel_t' has no member named 'p' > doLayout.c:244:13: error: 'textlabel_t' has no member named 'p' > make: *** [doLayout.o] Error 1 > ERROR: compilation failed for package 'Rgraphviz' > * removing 'C:/Users/BVINSO~1/AppData/Local/Temp/Rtmpz6M19V/Rinst76da24d2/Rgraph > viz' > ----------------------------------- > ERROR: package installation failed > > > > > [1] http://www.murdoch-sutherland.com/Rtools/ > [2] http://www.stats.ox.ac.uk/pub/Rtools/goodies/Win64No_/ > [3] https://stat.ethz.ch/pipermail/bioconductor/2009-March/026585.html > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 From jbustosmelo at yahoo.es Wed May 4 02:15:53 2011 From: jbustosmelo at yahoo.es (Jose Bustos Melo) Date: Wed, 4 May 2011 01:15:53 +0100 (BST) Subject: [R] scatterplot3d using colors in groups In-Reply-To: Message-ID: <979900.68251.qm@web26501.mail.ukl.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: no disponible URL: From A.Robinson at ms.unimelb.edu.au Wed May 4 02:20:20 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 10:20:20 +1000 Subject: [R] Watts Strogatz game In-Reply-To: <1304408575725-3491922.post@n4.nabble.com> References: <1304408575725-3491922.post@n4.nabble.com> Message-ID: <20110504002020.GT48756@ms.unimelb.edu.au> Hi, I have no familiarity with these functions --- I see that they are not in base R --- so I suggest that at very least you identify the package that you are using. Better would be to contact the package maintainer directly. Sometimes maintainers do not read R-help. Cheers Andrew On Tue, May 03, 2011 at 12:42:55AM -0700, kparamas wrote: > Hi, > > I have a erdos-renyi game with 6000 nodes and probability 0.003. > > g1 = erdos.renyi.game(6000, 0.003) > > How to create a Watts Strogatz game with the same probability. > > g1 = watts.strogatz.game(1, 6000, ?, ?) > What should be the third and fourth parameter to this argument. > > > -- > View this message in context: http://r.789695.n4.nabble.com/Watts-Strogatz-game-tp3491922p3491922.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From mackay at northnet.com.au Wed May 4 02:37:29 2011 From: mackay at northnet.com.au (Duncan Mackay) Date: Wed, 04 May 2011 10:37:29 +1000 Subject: [R] scatterplot3d using colors in groups In-Reply-To: <979900.68251.qm@web26501.mail.ukl.yahoo.com> References: <979900.68251.qm@web26501.mail.ukl.yahoo.com> Message-ID: <201105040038.p440ciks005736@mail15.tpg.com.au> At 10:15 04/05/2011, you wrote: >Content-Type: text/plain >Content-Disposition: inline >Content-length: 537 > >Hi everyone, > >I would like to improve my plot and I was >wondering if someone can help me whith it. I'm >trying this plot using two groups, but I want to >choice the colors (the black and white circles) >but I don't know how to change it from here. These are my sentences: > >myplot3d<- scatterplot3d(myfile$Temperature, >acantarcthus$Salinity,myfile$Abundance, type="h", > color = as.integer(factor(myfile$groups))) > >If someone is willing to help me with it I would be so glad. > >Jos? > > [[alternative HTML version deleted]] > > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. Hi Jos? Have you seen the result of as.integer(factor(myfile$groups))) ? I would have given you a clue. What you need is c("black","white") or the numerical col number equivalent by the order of the groups. 2 ways: ifelse(myfile$groups == factor1, "black","white") or if you had more factors sapply(myfile$groups, pmatch, unique(myfile$groups) ) Regards Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mackay at northnet.com.au From rvaradhan at jhmi.edu Wed May 4 03:34:00 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Tue, 3 May 2011 21:34:00 -0400 Subject: [R] help with the maxBHHH routine In-Reply-To: References: Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F332@WIGGUMVS.win.ad.jhu.edu> maxBHHH is *not* an in-built R function. It is in a distributed package called "maxLik". Always tell us which package is being used so that it is easier for us to help you. The error message says that the gradient function is returning a 10 x 2 matrix, whereas you say that you have 1000's of observations and 821 parameters. Show us a simplified version of your problem. It is difficult to help you without seeing an example. Also, try running w/o the gradient and see how it works. This is not the answer you wanted, but we cannot help you w/o seeing your example. Ravi. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Rohit Pandey [rohitpandey576 at gmail.com] Sent: Tuesday, May 03, 2011 4:53 PM To: r-help at r-project.org Subject: [R] help with the maxBHHH routine Hello R community, I have been using R's inbuilt maximum likelihood functions, for the different methods (NR, BFGS, etc). I have figured out how to use all of them except the maxBHHH function. This one is different from the others as it requires an observation level gradient. I am using the following syntax: maxBHHH(logLik,grad=nuGradient,finalHessian="BHHH",start=prm,iterlim=2) where logLik is the likelihood function and returns a vector of observation level likelihoods and nuGradient is a function that returns a matrix with each row corresponding to a single observation and the columns corresponding to the gradient values for each parameter (as is mentioned in the online help). however, this gives me the following error: *Error in checkBhhhGrad(g = gr, theta = theta, analytic = (!is.null(attr(f, : the matrix returned by the gradient function (argument 'grad') must have at least as many rows as the number of parameters (10), where each row must correspond to the gradients of the log-likelihood function of an individual (independent) observation: currently, there are (is) 10 parameter(s) but the gradient matrix has only 2 row(s) * It seems it is expecting as many rows as there are parameters. So, I changed my likelihood function so that it would return the transpose of the earlier matrix (hence returning a matrix with rows equaling parameters and columns, observations). However, when I run the function again, I still get an error: *Error in gr[, fixed] <- NA : (subscript) logical subscript too long* I have verified that my gradient function, when summed across observations gives the same results as the in built numerical gradient (to the 11th decimal place - after that, they differ since R's function is numerical). I am trying to run a very large estimation (1000's of observations and 821 parameters) and all of the other methods are taking way too much time (days). This method is our last hope and so, any help will be greatly appreciated. -- Thanks in advance, Rohit Mob: 91 9819926213 [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From ehlers at ucalgary.ca Wed May 4 04:06:03 2011 From: ehlers at ucalgary.ca (P Ehlers) Date: Tue, 03 May 2011 20:06:03 -0600 Subject: [R] na.omit - Is it working properly? In-Reply-To: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> References: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> Message-ID: <4DC0B48B.9040100@ucalgary.ca> Kalicin, Sarah wrote: \begin{quote} I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. \end{quote} You can prove this statement by providing reproducible code that we can test. Peter Ehlers I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah > > With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R. > P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row. >> P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NA > > Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above. >> na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From antrael at hotmail.com Wed May 4 05:17:44 2011 From: antrael at hotmail.com (jouba) Date: Tue, 3 May 2011 20:17:44 -0700 (PDT) Subject: [R] Specification of the model In-Reply-To: References: <1301253139729-3409642.post@n4.nabble.com> <4D9073F7.2040309@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From usha.nair at tcs.com Wed May 4 07:03:52 2011 From: usha.nair at tcs.com (Usha) Date: Tue, 3 May 2011 22:03:52 -0700 (PDT) Subject: [R] fitting distributions using fitdistr (MASS) In-Reply-To: <1304416399408-3492103.post@n4.nabble.com> References: <1304416399408-3492103.post@n4.nabble.com> Message-ID: <1304485432124-3494532.post@n4.nabble.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Wed May 4 07:30:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 3 May 2011 22:30:07 -0700 Subject: [R] fitting distributions using fitdistr (MASS) In-Reply-To: <1304485432124-3494532.post@n4.nabble.com> References: <1304416399408-3492103.post@n4.nabble.com> <1304485432124-3494532.post@n4.nabble.com> Message-ID: On May 3, 2011, at 10:03 PM, Usha wrote: > Thanks for the help. > I would like to explain my problem. > I have sample of scores from tests which varies form 0 to 35. > Now, i want to find out the best fit distribution for this data. I > need to > order the distributions based on their best fit. > For this i am using the function fitdistr(). [One of the Ref.used : > FITTING > DISTRIBUTIONS WITH R by Vito Ricci. ] > > Example: >> scores<-sample(0:35,500,replace=T) >> normalfit<-fitdistr(scores,"normal") >> normalfit > mean sd > 16.8460000 10.1361869 > ( 0.4533041) ( 0.3205344) >> normalfit$loglik > [1] -1867.525 >> kstestnormal<-ks.test(scores,"pnorm",16.8460000, 10.1361869) # for >> the > measure of goodness > > > 1) Am i doing the right thing? No. The most important right thing you are not doing is describing your goals. Clearly you do _not_ want the best fitting distribution, since the best fit distribution would be a multinomial distribution with whatever probabilities would exactly fit the sample. > 2) If yes, can't i follow the same procedure for all the distributions > supported by fitdistr? With the start values wherever necessary? You can do anything you want. But have you considered the power of this method and the error rates? Is there no science behind this to guide what is so far an aimless search strategy? > 3) Do I have to consider/worry about the warnings that I get? We cannot force you to heed the warnings. -- David Winsemius, MD Heritage Laboratories West Hartford, CT From pdalgd at gmail.com Wed May 4 08:02:51 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Wed, 4 May 2011 08:02:51 +0200 Subject: [R] na.omit - Is it working properly? In-Reply-To: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> References: <9DA5872FEF993D41B7173F58FCF6BE94D8720D65@orsmsx504.amr.corp.intel.com> Message-ID: On May 3, 2011, at 21:18 , Kalicin, Sarah wrote: > > I have a work around for this, but can someone explain why the first example does not work properly? I believed it worked in the previous version of R, by selecting just the rows=200525 and omitting the na's. I just upgraded to 2.13. I am also concern with the row numbers being different in the selections, should I be worried? FYI, I just selected the first few rows for demonstration, please do not worry that the number of rows shown are not equal. - Sarah > > With na.omit around the column, but it is showing other values in the F.WW column other than 200525, along with NA. I was hoping that this would omit all the NA's, and show all the rows that P$F.WW=200525. I believe it did with the previous version of R. That's highly unlikely. na.omit(P$WW) has fewer elements than there are rows in P so you get vector recycling in the style of > thuesen[c(F,F,F,F,T),] blood.glucose short.velocity 5 7.2 1.27 10 12.2 1.22 15 6.7 1.52 20 16.1 1.05 (now why don't we get the usual warning about "not a multiple of" in this case?) Worse, if you omit observations prior to comparison, the result won't line up. E.g. in the thuesen data, obs. > thuesen[na.omit(thuesen$short.velocity)==1.12,] blood.glucose short.velocity 16 8.6 NA 22 4.9 1.03 whereas in fact > subset(thuesen, short.velocity==1.12) blood.glucose short.velocity 17 4.2 1.12 23 8.8 1.12 > P[na.omit(P$F.WW)==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > 53 NA NA > 61 200534 200534 > 63 200608 200608 > 66 200522 200541 > 80 NA NA > 150 200521 200516 > 231 200530 200530 > > No na.omit, the F.WW=200525 seems to work, but lots of NA included. This is what is expected!! The row numbers are not the same as the above example, except the first row. >> P[P$F.WW==200525, c(51, 52)] > F.WW R.WW > 45 200525 NA > NA NA NA > NA.1 NA NA > NA.2 NA NA > NA.3 NA NA > 57 200525 200526 > 65 200525 NA > 67 200525 NA > 70 200525 200525 > NA.4 NA NA > NA.5 NA NA > 86 200525 NA Presumably, a number of rows got omitted here? The NA's are a bit of a pain, but that's the way things work: If there is an observation that you don't know whether to include, you get an NA filled row. > thuesen[thuesen$short.velocity==1.12,] blood.glucose short.velocity NA NA NA 17 4.2 1.12 23 8.8 1.12 To avoid this, you explicitly test for NA using is.na() or use subset() which does it internally. > > Na.omit excludes the na's. This is what I want. The concern I have is why the row numbers do not match any of those shown in the examples above. >> na.omit(P[P$F.WW==200525, c(51, 52)]) > F.WW R.WW > 57 200525 200526 > 70 200525 200525 > 161 200525 200525 > 245 200525 200525 > 246 200525 200525 > 247 200525 200526 > 256 200525 200525 > 266 200525 200525 > 269 200525 200525 > 271 200525 200526 > 276 200525 200526 > 278 200525 200526 > Well, now you remove rows with NA _anywhere_, so e.g. row #65 is out because R.WW is missing. I expect #161 and higher was just chopped from the earlier list. In short, nothing out of the ordinary seems to be going on here. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From mhramire at uc.cl Wed May 4 06:29:37 2011 From: mhramire at uc.cl (=?ISO-8859-1?Q?Mat=EDas_Ram=EDrez_Salgado?=) Date: Wed, 4 May 2011 01:29:37 -0300 Subject: [R] problem with package "adapt" for R in Mac Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sterlesser at hotmail.com Wed May 4 06:08:03 2011 From: sterlesser at hotmail.com (sterlesser) Date: Tue, 3 May 2011 21:08:03 -0700 (PDT) Subject: [R] nls problem with R Message-ID: <1304482083098-3494454.post@n4.nabble.com> the original data are V2 =c(371000,285000 ,156000, 20600, 4420, 3870, 5500 ) T2=c( 0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553) nls2=nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9,cl=6.2,t0=8.7)) after execution error occurs as below ################################################################ Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates Error in nlsModel(formula, mf, start, wts) : singular gradient matrix at initial parameter estimates In addition: Warning messages: 1: In lhs - rhs : longer object length is not a multiple of shorter object length 2: In .swts * attr(rhs, "gradient") : longer object length is not a multiple of shorter object length could anyone help me ?thansks -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3494454.html Sent from the R help mailing list archive at Nabble.com. From nspence3 at binghamton.edu Wed May 4 06:30:01 2011 From: nspence3 at binghamton.edu (Nekeisha Spencer) Date: Tue, 3 May 2011 21:30:01 -0700 (PDT) Subject: [R] HELP Message-ID: <645695.26325.qm@web161416.mail.bf1.yahoo.com> Accelerated Failure Time Model to Proportional Hazard Form Greetings R users: I have been working on a problem for a while and can't seem to get any result. I am trying to convert accelerated failure time estimates to proportional form. I keep getting an error that I can't understand and don't know if it can be debugged or not. My code is as follows: data<-read.csv("repdata.csv") da<-na.omit(data) attach(da) library(survival) library(rms) weib1<-psm(Surv(time,ratified)~island+pop+gdppc+fh+nngo+industryemplimp+icrgfromper+ngoicrgfromper+industryemplimpicrgfromper, data=da, dist="weibull",init=NULL, scale=0, control=survreg.control(maxiter=1000)) print(weib1) f.ph<-pphsm(weib1) However, after running the last line of code, I get the following: Warning message: In pphsm(weib1) : at present, pphsm does not return the correct covariance matrix Any suggestions will be greatly appreciated!! My data is attached. Thanks in Advance NSPENCER From veepsirtt at gmail.com Wed May 4 06:32:38 2011 From: veepsirtt at gmail.com (veepsirtt) Date: Tue, 3 May 2011 21:32:38 -0700 (PDT) Subject: [R] RStudio -manipulate command In-Reply-To: References: <1303805889697-3474947.post@n4.nabble.com> <1303807508963-3474976.post@n4.nabble.com> <1303827006861-3475544.post@n4.nabble.com> <1303962711678-3480083.post@n4.nabble.com> Message-ID: <1304483558429-3494489.post@n4.nabble.com> Why the mean value " h" is not changing as the slider moves from 0 to 25 ?. It remains always constant. library(manipulate) example <- function(x.max){ plot(cars, xlim=c(0,x.max)) abline(h=mean(cars$dist),col="blue",lty=2) } manipulate( example(x.max), x.max=slider(0,25, step=5) ) veepsirtt -- View this message in context: http://r.789695.n4.nabble.com/RStudio-manipulate-command-tp3474947p3494489.html Sent from the R help mailing list archive at Nabble.com. From shekhar2581 at gmail.com Wed May 4 07:24:56 2011 From: shekhar2581 at gmail.com (Shekhar) Date: Tue, 3 May 2011 22:24:56 -0700 (PDT) Subject: [R] How to fit a random data into Beta distribution? In-Reply-To: References: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com> Message-ID: Hi Steven, Thanks for the quick reply. i have tried but its giving me error--->Error in optim(x = c(38.1815173696765, -12.7988197976440, -3.88212459045077, : initial value in 'vmmin' is not finite i have tried something like this: library(MASS) x<-rnorm(n=100,mean=10,sd=20); fitdistr(x,dbeta,start=list(shape1=1,shape2=1) Please correct me if my understanding is wrong: In the fitdistr fucntion we are providing the initial values of the Beta distribution parameters as shape1=1 and shape2=1. This function will try to fit the data and give us the new parameters of Beta distribution that approximately fits this data. I have tried the function with other distribution like Normal, Gamma, Weibull...its working fine.. Regards, Som Shekhar On May 4, 1:25?am, Steven Kennedy wrote: > library(MASS) > fitdistr(x,"beta",list(shape1=1,shape2=1)) > > > > On Tue, May 3, 2011 at 9:44 PM, Shekhar wrote: > > > Hi, > > I have some random data and i want to find out the parameters of Beta > > distribution ( a and b) such that this data approximately fits into > > this distribution. I have tried by plot the histograms and graph, but > > it requires lot of tuning and i am unable to do that. can anyone tell > > me how to do it programmitically in R? > > > Regards, > > Som Shekhar > > > ______________________________________________ > > R-h... at r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From arne.henningsen at googlemail.com Wed May 4 08:53:26 2011 From: arne.henningsen at googlemail.com (Arne Henningsen) Date: Wed, 4 May 2011 08:53:26 +0200 Subject: [R] help with the maxBHHH routine In-Reply-To: References: Message-ID: Dear Rohit On 3 May 2011 22:53, Rohit Pandey wrote: > Hello R community, > > I have been using R's inbuilt maximum likelihood functions, for the > different methods (NR, BFGS, etc). > > I have figured out how to use all of them except the maxBHHH function. This > one is different from the others as it requires an observation level > gradient. > > I am using the following syntax: > > maxBHHH(logLik,grad=nuGradient,finalHessian="BHHH",start=prm,iterlim=2) > > where logLik is the likelihood function and returns a vector of observation > level likelihoods and nuGradient is a function that returns a matrix with > each row corresponding to a single observation and the columns corresponding > to the gradient values for each parameter (as is mentioned in the online > help). > > however, this gives me the following error: > > *Error in checkBhhhGrad(g = gr, theta = theta, analytic = (!is.null(attr(f, > : > ?the matrix returned by the gradient function (argument 'grad') must have > at least as many rows as the number of parameters (10), where each row must > correspond to the gradients of the log-likelihood function of an individual > (independent) observation: > ?currently, there are (is) 10 parameter(s) but the gradient matrix has only > 2 row(s) > * > It seems it is expecting as many rows as there are parameters. So, I changed > my likelihood function so that it would return the transpose of the earlier > matrix (hence returning a matrix with rows equaling parameters and columns, > observations). > > However, when I run the function again, I still get an error: > *Error in gr[, fixed] <- NA : (subscript) logical subscript too long* > > I have verified that my gradient function, when summed across observations > gives the same results as the in built numerical gradient (to the 11th > decimal place - after that, they differ since R's function is numerical). > > I am trying to run a very large estimation (1000's of observations and 821 > parameters) and all of the other methods are taking way too much time > (days). This method is our last hope and so, any help will be greatly > appreciated. Please make yourself familiar with the BHHH algorithm and read the documentation of maxBHHH: it says about argument "grad": "[...] If the BHHH method is used, ?grad? must return a matrix, where rows correspond to the gradient vectors of individual observations and the columns to the individual parameters.[...]" More information of the maxLik package is available at: http://dx.doi.org/10.1007/s00180-010-0217-1 Best regards, Arne -- Arne Henningsen http://www.arne-henningsen.name From A.Robinson at ms.unimelb.edu.au Wed May 4 09:15:06 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 17:15:06 +1000 Subject: [R] nls problem with R In-Reply-To: <1304482083098-3494454.post@n4.nabble.com> References: <1304482083098-3494454.post@n4.nabble.com> Message-ID: <20110504071506.GU48756@ms.unimelb.edu.au> The fact that T2 and V2 are of different lengths seems like a likely culprit. Other than that, you need to find start points that do not lead to a singular gradient. There are several books that provide advice on obtaining initial parameter estimates for non-linear models. Google Books might help you. Cheers Andrew On Tue, May 03, 2011 at 09:08:03PM -0700, sterlesser wrote: > the original data are > V2 =c(371000,285000 ,156000, 20600, 4420, 3870, 5500 ) > T2=c( 0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553) > nls2=nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9,cl=6.2,t0=8.7)) > after execution error occurs as below > ################################################################ > Error in nlsModel(formula, mf, start, wts) : > singular gradient matrix at initial parameter estimates > Error in nlsModel(formula, mf, start, wts) : > singular gradient matrix at initial parameter estimates > In addition: Warning messages: > 1: In lhs - rhs : > longer object length is not a multiple of shorter object length > 2: In .swts * attr(rhs, "gradient") : > longer object length is not a multiple of shorter object length > > could anyone help me ?thansks > > -- > View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3494454.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Wed May 4 09:19:19 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 17:19:19 +1000 Subject: [R] problem with package "adapt" for R in Mac In-Reply-To: References: Message-ID: <20110504071919.GV48756@ms.unimelb.edu.au> Hi, Is there such a package? I can't find it on CRAN. Can you let us know exactly how you tried to install it, and what the error message was (if any)? Cheers Andrew On Wed, May 04, 2011 at 01:29:37AM -0300, Mat?as Ram?rez Salgado wrote: > Hi, > > How i can install the package "adapt" in some version of R for mac? > > i try in 2.13, 2.9,2.7 and other previous versions... and nothing happens. > > and another question: There are some packages that do the same but that it > is implemented for mac? (calculate integrals in 2 or more dimmensions). > > help me please, it's for an important work. > > greetings. > > > -- > Mat?as Hern?n Ram?rez Salgado. > Estudiante de Estad?stica. > Pontificia Universidad Cat?lica de Chile. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From jim at bitwrit.com.au Wed May 4 09:22:07 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Wed, 04 May 2011 17:22:07 +1000 Subject: [R] Constructing a histogram with words as labels as height as frequency? In-Reply-To: References: Message-ID: <4DC0FE9F.7090001@bitwrit.com.au> On 05/04/2011 03:35 AM, Caitlin wrote: > Hi all. > > I need to construct a plot showing words on the x-axis and how many times > each word was given as a verbal response on the y-axis as solid bar > (frequency). Is there a convenient function to do this in R? I considered > hist(), but I'm not sure how to construct the text file. Example: > > apple, 2 > pear, 14 > house, 1 > beach, 5 > computer, 15 > Hi Caitlin, barp(caitlinsdata[,1],names.arg=caitlinsdata[,2] Jim From e.hofstadler at gmail.com Wed May 4 09:25:06 2011 From: e.hofstadler at gmail.com (E Hofstadler) Date: Wed, 4 May 2011 10:25:06 +0300 Subject: [R] adding columns to dataframes contained in a list Message-ID: hi there, I have a list of 5 identical dataframes: mydf <- data.frame(x=c(1:5), y=c(21:25)) mylist <- rep(list(mydf),5) and a factor variable with 5 levels: foo <- c(letters[1:5]) foo <- as.factor(foo) Question: I'd like to add a new variable to each dataframe in the list, each containing only one level of the factor variable. So mylist[[1]] should have a new variable z containing only "a", in mylist[[2]] the new variable z should contain only "b", etc. (How) can this be done without looping? All help is greatly appreciated. Best, Esther From rroa at azti.es Wed May 4 09:50:22 2011 From: rroa at azti.es (=?iso-8859-1?Q?Rub=E9n_Roa?=) Date: Wed, 4 May 2011 09:50:22 +0200 Subject: [R] nls problem with R In-Reply-To: <20110504071506.GU48756@ms.unimelb.edu.au> References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> Message-ID: <5CD78996B8F8844D963C875D3159B94A02354D39@dsrcorreo> In addition to Andrew's advice, you should get more familiar with your nonlinear model. >From what you wrote, as T2 tends to infinity, V2 tends to v0*(1-epi). There you have a baseline on the Y-axis towards which your model tends, and this will give you sensible starting values for v0 and epi. Also, as T0 tends to 0, V2 tends to v0(1-epi(1+exp(cl*t0))). There you have another higher point on the Y-axis, and this one will give you additional sensible starting values for cl and t0. Plot the data and the predicted model with your initial values and sends the model-data combination to the optimizer once you see that the predicted line is close to the observed response. V2 <- c(371000, 285000 ,156000, 20600, 4420, 3870, 5500 ) T2 <- c(0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553, 1.333) #last value inserted for illustration. #nls2 <- nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 ,cl=6.2,t0=8.7)) v0.ini <- 10^7 epi.ini <- 0.9 cl.ini <- 6.2 t0.ini <- 8.7 V2.pred.ini <- v0.ini*(1-epi.ini+epi.ini*exp(-cl.ini*(T2-t0.ini))) plot(T2,V2) lines(T2,V2.pred.ini) As you can see, with your initial values the line doesn't even show on the plot. No wonder the gradients are singular. So go find better initial values by trial and error and check the results on the plot. Then the optimizer called by nls will finish the job (hopefully). Then you repeat your plot this time with the estimates instead of the initial values. This may get you started in the business of estimating nolinear models. HTH Rub?n ____________________________________________________________________________________ Dr. Rub?n Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -----Mensaje original----- > De: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] En nombre de Andrew Robinson > Enviado el: mi?rcoles, 04 de mayo de 2011 9:15 > Para: sterlesser > CC: r-help at r-project.org > Asunto: Re: [R] nls problem with R > > The fact that T2 and V2 are of different lengths seems like a > likely culprit. Other than that, you need to find start > points that do not lead to a singular gradient. There are > several books that provide advice on obtaining initial > parameter estimates for non-linear models. Google Books > might help you. > > Cheers > > Andrew > > > > > On Tue, May 03, 2011 at 09:08:03PM -0700, sterlesser wrote: > > the original data are > > V2 =c(371000,285000 ,156000, 20600, 4420, 3870, 5500 ) T2=c( 0.3403 > > ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553) > > > nls2=nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 > > ,cl=6.2,t0=8.7)) > > after execution error occurs as below > > ################################################################ > > Error in nlsModel(formula, mf, start, wts) : > > singular gradient matrix at initial parameter estimates Error in > > nlsModel(formula, mf, start, wts) : > > singular gradient matrix at initial parameter estimates > In addition: > > Warning messages: > > 1: In lhs - rhs : > > longer object length is not a multiple of shorter object length > > 2: In .swts * attr(rhs, "gradient") : > > longer object length is not a multiple of shorter object length > > > > could anyone help me ?thansks > > > > -- > > View this message in context: > > > http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3494454.htm > > l Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Program Manager, ACERA > Department of Mathematics and Statistics Tel: > +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia > (prefer email) > http://www.ms.unimelb.edu.au/~andrewpr Fax: > +61-3-8344-4599 > http://www.acera.unimelb.edu.au/ > > Forest Analytics with R (Springer, 2011) > http://www.ms.unimelb.edu.au/FAwR/ > Introduction to Scientific Programming and Simulation using R > (CRC, 2009): > http://www.ms.unimelb.edu.au/spuRs/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From savicky at praha1.ff.cuni.cz Wed May 4 09:50:59 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Wed, 4 May 2011 09:50:59 +0200 Subject: [R] Simple loop In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D700042B94AC@NA-PA-VBE03.na.tibco.com> References: <1304437471793-3492819.post@n4.nabble.com> <77EB52C6DD32BA4D87471DCD70C8D700042B94AC@NA-PA-VBE03.na.tibco.com> Message-ID: <20110504075059.GA1116@praha1.ff.cuni.cz> On Tue, May 03, 2011 at 12:04:47PM -0700, William Dunlap wrote: [...] > ave() can deal that problem: > > cbind(x, newCol2 = with(x, ave(H, Site, Prof, > FUN=function(y)y-min(y)))) > Site Prof H newCol2 > 1 1 1 24 8 > 2 1 1 16 0 > 3 1 1 67 51 > 4 1 2 23 0 > 5 1 2 56 33 > 6 1 2 45 22 > 7 2 1 67 21 > 8 2 1 46 0 > Warning message: > In min(y) : no non-missing arguments to min; returning Inf > The warning is unfortunate: ave() calls FUN even for when > there is no data for a particular group (Site=2, Prof=2 in this > case). The warning may be avoided using min(y, Inf) instead of min(). cbind(x, newCol2 = with(x, ave(H, Site, Prof, FUN=function(y)y-min(y,Inf)))) Site Prof H newCol2 1 1 1 24 8 2 1 1 16 0 3 1 1 67 51 4 1 2 23 0 5 1 2 56 33 6 1 2 45 22 7 2 1 67 21 8 2 1 46 0 Another approach is to combine Site, Prof to a single column in any way suitable for the application. For example cbind(x, newCol2 = with(x, ave(H, paste(Site, Prof), FUN=function(y)y-min(y)))) Petr Savicky. From jeanpaul.ebejer at inhibox.com Wed May 4 11:03:39 2011 From: jeanpaul.ebejer at inhibox.com (JP) Date: Wed, 4 May 2011 10:03:39 +0100 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: Message-ID: On 3 May 2011 20:50, peter dalgaard wrote: > > On Apr 28, 2011, at 15:18 , JP wrote: > >> >> >> I have found that when doing a wilcoxon signed ranked test you should report: >> >> - The median value (and not the mean or sd, presumably because of the >> underlying potential non normal distribution) >> - The Z score (or value) >> - r >> - p value >> > > ...printed on 40g/m^2 acid free paper with a pencil of 3B softness? > > Seriously, with nonparametrics, the p value is the only thing of real interest, the other stuff is just attempting to check on authors doing their calculations properly. The median difference is of some interest, but it is not actually what is being tested, and in heavily tied data, it could even be zero with a highly significant p-value. The Z score can in principle be extracted from the p value (qnorm(p/2), basically) but it's obviously unstable in the extreme cases. What is r? The correlation? Pearson, not Spearman? > Thanks for this Peter - a couple of more questions: a <- rnorm(500) b <- runif(500, min=0, max=1) x <- wilcox.test(a, b, alternative="two.sided", exact=T, paired=T) x$statistic V 31835 What is V? (is that the value Z of the test statistic)? z.score <- qnorm(x$p.value/2) [1] -9.805352 But what does this zscore show in practice? The d.f. are suggested to be reported here: http://staff.bath.ac.uk/pssiw/stats2/page2/page3/page3.html And r is mentioned here http://huberb.people.cofc.edu/Guide/Reporting_Statistics%20in%20Psychology.pdfs >> My questions are: >> >> - Are the above enough/correct values to report (some places even >> quote W and df) ? > > df is silly, and/or blatantly wrong... > >> ?What else would you suggest? >> - How do I calculate the Z score and r for the above example? >> - How do I get each statistic from the pairwise.wilcox.test call? >> >> Many Thanks >> JP >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com > > From joda2457 at student.uu.se Wed May 4 11:17:29 2011 From: joda2457 at student.uu.se (Joel) Date: Wed, 4 May 2011 02:17:29 -0700 (PDT) Subject: [R] Read last line of a file Message-ID: <1304500649516-3494963.post@n4.nabble.com> Hi dear R users. I got a file that I need to extract the third column (or word) of the last line of files that has a diffrent amounts of rows. It works with x<-read.tables("file") x[1,3] This returns the proper result but as the files is large this takes time and uses memory that is just unneccery. p<-read.table(textConnection(system("tail -1 file",intern=TRUE))) p[1,3] This also returns the proper result but then requires the system to be unix based witch is quite silly if you ask me. Would rather just use R commands. So Im wondering if anyone of you got a better way of reading the last line of a file and extracting the third column (or word) of that line. Best regards //Joel Damberg -- View this message in context: http://r.789695.n4.nabble.com/Read-last-line-of-a-file-tp3494963p3494963.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Wed May 4 11:32:28 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Wed, 04 May 2011 11:32:28 +0200 Subject: [R] problem with package "adapt" for R in Mac In-Reply-To: <20110504071919.GV48756@ms.unimelb.edu.au> References: <20110504071919.GV48756@ms.unimelb.edu.au> Message-ID: <4DC11D2C.3020901@statistik.tu-dortmund.de> On 04.05.2011 09:19, Andrew Robinson wrote: > Hi, > > Is there such a package? There was such a package that is archived now. > I can't find it on CRAN. Can you let us > know exactly how you tried to install it, and what the error message > was (if any)? The OP is probably looking for packages such as - R2Cuba - cubature Uwe Ligges > > Cheers > > Andrew > > > On Wed, May 04, 2011 at 01:29:37AM -0300, Mat?as Ram?rez Salgado wrote: >> Hi, >> >> How i can install the package "adapt" in some version of R for mac? >> >> i try in 2.13, 2.9,2.7 and other previous versions... and nothing happens. >> >> and another question: There are some packages that do the same but that it >> is implemented for mac? (calculate integrals in 2 or more dimmensions). >> >> help me please, it's for an important work. >> >> greetings. >> >> >> -- >> Mat?as Hern?n Ram?rez Salgado. >> Estudiante de Estad?stica. >> Pontificia Universidad Cat?lica de Chile. >> >> [[alternative HTML version deleted]] >> > >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > From savicky at praha1.ff.cuni.cz Wed May 4 11:34:36 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Wed, 4 May 2011 11:34:36 +0200 Subject: [R] Read last line of a file In-Reply-To: <1304500649516-3494963.post@n4.nabble.com> References: <1304500649516-3494963.post@n4.nabble.com> Message-ID: <20110504093436.GA7453@praha1.ff.cuni.cz> On Wed, May 04, 2011 at 02:17:29AM -0700, Joel wrote: > Hi dear R users. > > I got a file that I need to extract the third column (or word) of the last > line of files that has a diffrent amounts of rows. > It works with > > x<-read.tables("file") > x[1,3] > > This returns the proper result but as the files is large this takes time and > uses memory that is just unneccery. > > p<-read.table(textConnection(system("tail -1 file",intern=TRUE))) > p[1,3] > > This also returns the proper result but then requires the system to be unix > based witch is quite silly if you ask me. Would rather just use R commands. > > So Im wondering if anyone of you got a better way of reading the last line > of a file and extracting the third column (or word) of that line. Hi. The following reads the file into memory, but it is more efficient than read.table(), since it does no parsing of the file as a whole. x <- readLines("file") strsplit(x[length(x)], " +")[[1]][3] Hope this helps. Petr Savicky. From stevenkennedy2263 at gmail.com Wed May 4 11:46:54 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Wed, 4 May 2011 19:46:54 +1000 Subject: [R] How to fit a random data into Beta distribution? In-Reply-To: References: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com> Message-ID: Hi Shekhar, It looks from your error that you have values outside the range 0 to 1. The beta distribution is only defined between 0 and 1. Can you please post your data set (or at least a portion of it)? On Wed, May 4, 2011 at 3:24 PM, Shekhar wrote: > Hi Steven, > Thanks for the quick reply. i have tried but > its giving me error--->Error in optim(x = c(38.1815173696765, > -12.7988197976440, -3.88212459045077, ?: > ?initial value in 'vmmin' is not finite > > > i have tried something like this: > > library(MASS) > x<-rnorm(n=100,mean=10,sd=20); > fitdistr(x,dbeta,start=list(shape1=1,shape2=1) > > Please correct me if my understanding is wrong: > In the fitdistr fucntion we are providing the initial values of the > Beta distribution parameters as shape1=1 and shape2=1. This function > will try to fit the data and give us the new parameters of Beta > distribution that approximately fits this data. > > > I have tried the function with other distribution like Normal, Gamma, > Weibull...its working fine.. > > Regards, > Som Shekhar > > > > > On May 4, 1:25?am, Steven Kennedy wrote: >> library(MASS) >> fitdistr(x,"beta",list(shape1=1,shape2=1)) >> >> >> >> On Tue, May 3, 2011 at 9:44 PM, Shekhar wrote: >> >> > Hi, >> > I have some random data and i want to find out the parameters of Beta >> > distribution ( a and b) such that this data approximately fits into >> > this distribution. I have tried by plot the histograms and graph, but >> > it requires lot of tuning and i am unable to do that. can anyone tell >> > me how to do it programmitically in R? >> >> > Regards, >> > Som Shekhar >> >> > ______________________________________________ >> > R-h... at r-project.org mailing list >> >https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From r.t.a.j.leenders at rug.nl Wed May 4 11:57:46 2011 From: r.t.a.j.leenders at rug.nl (R.T.A.J.Leenders) Date: Wed, 04 May 2011 11:57:46 +0200 Subject: [R] issue with "strange" characters (locale settings) Message-ID: <7540aede63725.4dc13f3a@rug.nl> WinXP-x32, R-21.13.0 Dear list, I have a problem that (I think) relates to the interaction between Windows and R. I am trying to scrape a table with data on the Hawai'ian Islands, This is my code: library(XML) u <- "http://en.wikipedia.org/wiki/Hawaii" tables <- readHTMLTable(u) Islands <- tables[[5]] The output is (first set of columns): Island Nickname > Islands Island Nickname Location 1 Hawai????i[7] The Big Island 19????34??????N 155????30??????W?????? / ??????19.567 ????N 155.5????W?????? / 19.567; -155.5 2 Maui[8] The Valley Isle 20????48??????N 156????20??????W?????? / ??????20.8????N 156.333????W?????? / 20.8; -156.333 3 Kaho????olawe[9] The Target Isle 20????33??????N 156????36??????W?????? / ??????20.55 ????N 156.6????W?????? / 20.55; -156.6 4 L??na????i[10] The Pineapple Isle 20????50??????N 156????56??????W?????? / ??????20.833????N 15 6.933????W?????? / 20.833; -156.933 5 Moloka????i[11] The Friendly Isle 21????08??????N 157????02??????W?????? / ??????21.133????N 1 57.033????W?????? / 21.133; -157.033 6 O????ahu[12] The Gathering Place 21????28??????N 157????59??????W?????? / ??????21.467????N 1 57.983????W?????? / 21.467; -157.983 7 Kaua????i[13] The Garden Isle 22????05??????N 159????30??????W?????? / ??????22.083 ????N 159.5????W?????? / 22.083; -159.5 8 Ni????ihau[14] The Forbidden Isle 21????54??????N 160????10??????W?????? / ??????21.9????N 160.167????W?????? / 21.9; -160.167 As you can see, there are "weird" characters in there. I have also tried readHTMLTable(u, encoding = "UTF-16") and readHTMLTable(u, encoding = "UTF-8") but that didn't help. It seems to me that there may be an issue with the interaction of the Windows settings of the character set. sessionInfo() gives > sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 [4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.2-0.2 > I have also attempted to let R use another setting by entering: Sys.setlocale("LC_ALL", "en_US.UTF-8"), but this yields the response: > Sys.setlocale("LC_ALL", "en_US.UTF-8") [1] "" Warning message: In Sys.setlocale("LC_ALL", "en_US.UTF-8") : OS reports request to set locale to "en_US.UTF-8" cannot be honored > In addition, I have attempted to make the change directly from the windows command prompt, using: "chcp 65001" and variations of that, but that didn't change anything. I have searched the list and the web and have found others bringing forth a similar issues, but have not been able to find a solution. I looks like this is an issue of how Windows and R interact. Unfortunately, all three computers at my disposal have this problem. It occurs both under WinXP-x32 and under Win7-x86. Is there a way to make R override the windows settings or can the issue be solved otherwise? I have also tried other websites, and the issue occurs every time when there is an ??, ??, ??, ??, et cetera in the text-to-be-scraped. Thank you, Roger From p_connolly at slingshot.co.nz Wed May 4 12:12:36 2011 From: p_connolly at slingshot.co.nz (Patrick Connolly) Date: Wed, 4 May 2011 22:12:36 +1200 Subject: [R] adding columns to dataframes contained in a list In-Reply-To: References: Message-ID: <20110504101236.GA4893@slingshot.co.nz> On Wed, 04-May-2011 at 10:25AM +0300, E Hofstadler wrote: |> hi there, |> |> I have a list of 5 identical dataframes: |> |> mydf <- data.frame(x=c(1:5), y=c(21:25)) |> mylist <- rep(list(mydf),5) |> |> and a factor variable with 5 levels: |> |> foo <- c(letters[1:5]) |> foo <- as.factor(foo) |> |> |> Question: |> I'd like to add a new variable to each dataframe in the list, each |> containing only one level of the factor variable. So mylist[[1]] |> should have a new variable z containing only "a", in mylist[[2]] the |> new variable z should contain only "b", etc. |> |> (How) can this be done without looping? This will work: zz <- do.call("rbind", mylist) zz$z <- rep(foo, each = 5) split(zz, zz$z) You might want to rename of the list elements if those are inconvenient. HTH |> |> All help is greatly appreciated. |> |> Best, |> Esther |> |> ______________________________________________ |> R-help at r-project.org mailing list |> https://stat.ethz.ch/mailman/listinfo/r-help |> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html |> and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. From ligges at statistik.tu-dortmund.de Wed May 4 12:13:39 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Wed, 04 May 2011 12:13:39 +0200 Subject: [R] bootstrap vignette construction and package installation In-Reply-To: <4a731e6e-fa8e-4339-8311-5640a7ec99cf@z13g2000prk.googlegroups.com> References: <4a731e6e-fa8e-4339-8311-5640a7ec99cf@z13g2000prk.googlegroups.com> Message-ID: <4DC126D3.3000500@statistik.tu-dortmund.de> On 03.05.2011 22:20, Murat Tasan wrote: > hi all - i'm trying to 'R CMD build' a package, but i have what > appears to be a bootstrapping problem: > i've included a vignette in my package, with R code interwoven (and > built using Sweave), but in this documentation i have a code line: >> library(MyPackage) > > now, when trying to build a .tar.gz install-able version of my package > in a clean state, i remove the original compiled/installed package, > and the build fails because Sweave wants to execute the interwoven R > code even though the package hasn't yet been installed. > > i got around it by temporarily just hiding the the inst/doc/ > directory, installing the now-able-to-build package, then re-building > the package after the installation. > but this is a pretty clunky solution, and i imagine there's a more > elegant way to get around this bootstrapping problem that i'm just not > aware of. > > any thoughts? Not really: R CMD build should install a temporary copy of your package in order to be able to rebuild the vignettes. What is the complete output of R CMD build? Is the package available for us for testing? Uwe Ligges > thanks much for any help, > > -murat > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ripley at stats.ox.ac.uk Wed May 4 12:49:12 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 4 May 2011 11:49:12 +0100 (BST) Subject: [R] issue with "strange" characters (locale settings) In-Reply-To: <7540aede63725.4dc13f3a@rug.nl> References: <7540aede63725.4dc13f3a@rug.nl> Message-ID: Oh, please! This is about the contributed package XML, not R and not Windows. Some of us have worked very hard to provide reasonable font support in R, including on Windows. We are given exceedingly little credit, just the brickbats for things for which we are not responsible. (We even work hard to port XML to Windows for you, again with almost zero credit.) That URL is a page in UTF-8, as its header says. We have provided many ways to work with UTF-8 on Windows, but it seems readHTMLTable() is not making use of them. You need to run iconv() on the strings in your object (which as it has factors, are the levels). When you do so, you will discover that page contains characters not in your native charset (I presume, not having your locale). What you can do, in Rgui only, is for (n in names(Islands)) Encoding(levels(Islands[[n]])) <-"UTF-8" but likely there are still characters it will not know how to display. On Wed, 4 May 2011, R.T.A.J.Leenders wrote: > > WinXP-x32, R-21.13.0 > Dear list, > I have a problem that (I think) relates to the interaction between Windows > and R. > I am trying to scrape a table with data on the Hawai'ian Islands, This is my > code: > library(XML) > u <- "http://en.wikipedia.org/wiki/Hawaii" > tables <- readHTMLTable(u) > Islands <- tables[[5]] > The output is (first set of columns): > Island Nickname > > Islands > Island Nickname > Location > 1 Hawai????i[7] The Big Island 19????34??????N 155????30??????W?????? / ??????19.567 > ????N 155.5????W?????? / 19.567; -155.5 > 2 Maui[8] The Valley Isle 20????48??????N 156????20??????W?????? / ??????20.8????N > 156.333????W?????? / 20.8; -156.333 > 3 Kaho????olawe[9] The Target Isle 20????33??????N 156????36??????W?????? / ??????20.55 > ????N 156.6????W?????? / 20.55; -156.6 > 4 L??na????i[10] The Pineapple Isle 20????50??????N 156????56??????W?????? / ??????20.833????N 15 > 6.933????W?????? / 20.833; -156.933 > 5 Moloka????i[11] The Friendly Isle 21????08??????N 157????02??????W?????? / ??????21.133????N 1 > 57.033????W?????? / 21.133; -157.033 > 6 O????ahu[12] The Gathering Place 21????28??????N 157????59??????W?????? / ??????21.467????N 1 > 57.983????W?????? / 21.467; -157.983 > 7 Kaua????i[13] The Garden Isle 22????05??????N 159????30??????W?????? / ??????22.083 > ????N 159.5????W?????? / 22.083; -159.5 > 8 Ni????ihau[14] The Forbidden Isle 21????54??????N 160????10??????W?????? / ??????21.9????N > 160.167????W?????? / 21.9; -160.167 > > As you can see, there are "weird" characters in there. I have also tried > readHTMLTable(u, encoding = "UTF-16") and readHTMLTable(u, encoding = > "UTF-8") > but that didn't help. > It seems to me that there may be an issue with the interaction of the > Windows settings of the character set. > sessionInfo() gives > > sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 > LC_MONETARY=Dutch_Netherlands.1252 > [4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] XML_3.2-0.2 > > > I have also attempted to let R use another setting by entering: > Sys.setlocale("LC_ALL", "en_US.UTF-8"), but this yields the response: > > Sys.setlocale("LC_ALL", "en_US.UTF-8") > [1] "" > Warning message: > In Sys.setlocale("LC_ALL", "en_US.UTF-8") : > OS reports request to set locale to "en_US.UTF-8" cannot be honored > > > In addition, I have attempted to make the change directly from the windows > command prompt, using: "chcp 65001" and variations of that, but that didn't > change anything. > I have searched the list and the web and have found others bringing forth a > similar issues, but have not been able to find a solution. I looks like this > is an issue of how Windows and R interact. Unfortunately, all three > computers at my disposal have this problem. It occurs both under WinXP-x32 > and under Win7-x86. > Is there a way to make R override the windows settings or can the issue be > solved otherwise? > I have also tried other websites, and the issue occurs every time when there > is an ??, ??, ??, ??, et cetera in the text-to-be-scraped. > Thank you, > Roger > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From patrick.breheny at uky.edu Wed May 4 13:37:49 2011 From: patrick.breheny at uky.edu (Patrick Breheny) Date: Wed, 4 May 2011 07:37:49 -0400 Subject: [R] Overlapping x axes using Lattice In-Reply-To: <787543A672A2C543B8281EEADC2FE9017A7985@WDCWMSP386.network.maf.govt.nz> References: <787543A672A2C543B8281EEADC2FE9017A7940@WDCWMSP386.network.maf.govt.nz> <408338F86F0D4243BD5E7B74A8C0862B20569E206B@EX7FM03.ad.uky.edu> <787543A672A2C543B8281EEADC2FE9017A7985@WDCWMSP386.network.maf.govt.nz> Message-ID: <4DC13A8D.3030408@uky.edu> The scales argument in lattice controls the appearance of the axes. It consists of two lists, one for the x axis and one for the y axis. For example: histogram(~pesti[,1]|pesti[,2]+ pesti[,3],scales=list(x=list(at=c(2,6),labels=c("First","Second")))) This allows you to place the labels wherever you want. --Patrick On 05/03/2011 05:49 PM, Andrew McFadden wrote: > Hi Patrick > > The plot is an example ie not the real thing the x axes is a factor and > cant be numeric ie it is a titre of a laboratory test. If I change the > aspect ratio or the font size they still do not fit. How do you put > fewer labels on ie how do you tell R to skip every second label? > > Regardes > > Andrew McFadden MVS BVSc | Incursion Investigator (animals), > Investigation and Diagnostic Centre | Biosecurity New Zealand > Ministry of Agriculture and Forestry | 66 Ward St, Wallaceville | PO > Box 40 742 | Upper Hutt | New Zealand > Telephone: 64-4-894 5611 | Facsimile: 64-4-894 4973| Mobile: > 027-733-1791 | Web: www.maf.govt.nz > > > -----Original Message----- > From: Breheny, Patrick [mailto:patrick.breheny at uky.edu] > Sent: Wednesday, 4 May 2011 9:45 a.m. > To: Andrew McFadden; r-help at r-project.org > Subject: [Requires Classification] RE: Overlapping x axes using Lattice > > I'm not clear on what you're looking for here. Your x-axis is numeric, > why are you converting it to a factor? If you keep it numeric, the > labels don't overlap. Or perhaps you don't want it to be numeric, in > which case why not just change the aspect ratio of the plot until they > no longer overlap? Or change the axis font size? Or put fewer labels > on the axis? There is a finite amount of space on the axis, so you have > to either make more space, shrink the labels, or draw fewer of them. > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Andrew McFadden > Sent: Tuesday, May 03, 2011 5:05 PM > To: r-help at r-project.org > Subject: [R] Overlapping x axes using Lattice > > Hi R users > > I apologise in advance for this question as I suspect it is simple and > perhaps others have had this problem. > I am struggling to sort out how to fix the x axes so that the labels > don't overlap. > > I have put the following example together to show my problem. > > library(lattice) > > titre<- as.factor(rep(c(10999,20999,30999,40999,50999,60999, > 20000,40000,45000,50000,70000,80000),c(2,2,2,2,1,1,2,2,2,2,1,1))) > test<-rep(c("BVD","HSD"),c(10,10)) > age<-as.factor(rep(c(1,2,3,4),c(5,5,5,5))) > > pesti<-data.frame(titre,test,age) > > histogram(~pesti[,1]|pesti[,2]+ pesti[,3] > ,alternating=TRUE,tick.number=1, stack=TRUE,type = "count", > xlab="VNT",rot=c(180,180),draw=FALSE) > > Thank you in advance. > > Andy > > Andrew McFadden MVS BVSc > Incursion Investigator > Investigation& Diagnostic Centres - Wallaceville Biosecurity New > Zealand Ministry of Agriculture and Forestry > > Phone 04 894 5600 Fax 04 894 4973 Mobile 029 894 5611 Postal address: > Investigation and Diagnostic Centre- Wallaceville Box 40742 Ward St > Upper Hutt > > > > > > This email message and any attachment(s) is intended solely for the > addressee(s) > named above. The information it contains is confidential and may be > legally > privileged. Unauthorised use of the message, or the information it > contains, > may be unlawful. If you have received this message by mistake please > call the > sender immediately on 64 4 8940100 or notify us by return email and > erase the > original message and attachments. Thank you. > > The Ministry of Agriculture and Forestry accepts no responsibility for > changes > made to this email or to any attachments after transmission from the > office. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > This email message and any attachment(s) is intended solely for the addressee(s) > named above. The information it contains is confidential and may be legally > privileged. Unauthorised use of the message, or the information it contains, > may be unlawful. If you have received this message by mistake please call the > sender immediately on 64 4 8940100 or notify us by return email and erase the > original message and attachments. Thank you. > > The Ministry of Agriculture and Forestry accepts no responsibility for changes > made to this email or to any attachments after transmission from the office. > > From andrew.decker.steen at gmail.com Wed May 4 13:49:45 2011 From: andrew.decker.steen at gmail.com (Andrew D. Steen) Date: Wed, 4 May 2011 13:49:45 +0200 Subject: [R] what happens when I store linear models in an array? Message-ID: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mxkuhn at gmail.com Wed May 4 13:51:55 2011 From: mxkuhn at gmail.com (Max Kuhn) Date: Wed, 4 May 2011 07:51:55 -0400 Subject: [R] Bigining with a Program of SVR In-Reply-To: <1304435997731-3492746.post@n4.nabble.com> References: <1304106463512-3484476.post@n4.nabble.com> <1304429881697-3492487.post@n4.nabble.com> <1304435997731-3492746.post@n4.nabble.com> Message-ID: train() uses vectors, matrices and data frames as input. I really think you need to read materials on basic R before proceeding. Go to the R web page. There are introductory materials there. On Tue, May 3, 2011 at 11:19 AM, ypriverol wrote: > I saw the format of the caret data some days ago. It is possible to convert > my csv data with the same data a format as the caret dataset. My idea is to > use firstly the same scripts as caret tutorial, then i want to remove > problems related with data formats and incompatibilities. > > Thanks for your time > > -- > View this message in context: http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3492746.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max From fomcl at yahoo.com Wed May 4 13:52:07 2011 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Wed, 4 May 2011 04:52:07 -0700 (PDT) Subject: [R] first occurrence of a value? Message-ID: <688112.87808.qm@web110712.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From d.rizopoulos at erasmusmc.nl Wed May 4 14:07:02 2011 From: d.rizopoulos at erasmusmc.nl (Dimitris Rizopoulos) Date: Wed, 04 May 2011 14:07:02 +0200 Subject: [R] first occurrence of a value? In-Reply-To: <688112.87808.qm@web110712.mail.gq1.yahoo.com> References: <688112.87808.qm@web110712.mail.gq1.yahoo.com> Message-ID: <4DC14166.1030204@erasmusmc.nl> another approach is: df <- data.frame(j1999 = c(0,0,0,0,1,0), j2000 = c(NA,1,1,1,0,0), j2001 = c(1,0,1,0,0,0)) years <- as.numeric(gsub("^[^0-9]+", "", names(df))) ind <- apply(sapply(df, "==", 1), 1, function (x) which(x)[1]) df$year <- years[ind] I hope it helps. Best, Dimitris On 5/4/2011 1:52 PM, Albert-Jan Roskam wrote: > Hello, > > A simple question perhaps, but how do I, within each row, find the first > occurence of the number 1 in the df below? I want to use this position to > programmatically create the variable 'year'. I'v come up with a solution, but I > find it downright ugly. Is there a simpler way? I was hoping for a useful > built-in function that I don;t yet know about. > > df<- data.frame(j1999=c(0,0,0,0,1,0), j2000=c(NA, 1, 1, 1, 0, 0), j2001=c(1, 0, > 1, 0, 0, 0), year=c(2001, 2000, 2000, 2000, 1999, NA)) > library(gsubfn) > x<- apply(df==1, 1, which) > giveYear<- function(df) { return( as.numeric(gsubfn("^[^0-9]+", "", > names(df)[1])) ) } > df$year2<- sapply(x, giveYear) > > Thanks in advance! > > Cheers!! > Albert-Jan > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > All right, but apart from the sanitation, the medicine, education, wine, public > order, irrigation, roads, a fresh water system, and public health, what have the > Romans ever done for us? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ From patrick.breheny at uky.edu Wed May 4 14:17:25 2011 From: patrick.breheny at uky.edu (Patrick Breheny) Date: Wed, 4 May 2011 08:17:25 -0400 Subject: [R] first occurrence of a value? In-Reply-To: <688112.87808.qm@web110712.mail.gq1.yahoo.com> References: <688112.87808.qm@web110712.mail.gq1.yahoo.com> Message-ID: <4DC143D5.3040109@uky.edu> You may want to look into the function 'match', which finds the first occurrence of a value. In your example, df <- data.frame(j1999=c(0,0,0,0,1,0), j2000=c(NA, 1, 1, 1, 0, 0), j2001=c(1, 0, 1, 0, 0, 0), year=c(2001, 2000, 2000, 2000, 1999, NA)) apply(df,1,match,x=1) [1] 3 2 2 2 1 NA _______________________ Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky On 05/04/2011 07:52 AM, Albert-Jan Roskam wrote: > Hello, > > A simple question perhaps, but how do I, within each row, find the first > occurence of the number 1 in the df below? I want to use this position to > programmatically create the variable 'year'. I'v come up with a solution, but I > find it downright ugly. Is there a simpler way? I was hoping for a useful > built-in function that I don;t yet know about. > > df<- data.frame(j1999=c(0,0,0,0,1,0), j2000=c(NA, 1, 1, 1, 0, 0), j2001=c(1, 0, > 1, 0, 0, 0), year=c(2001, 2000, 2000, 2000, 1999, NA)) > library(gsubfn) > x<- apply(df==1, 1, which) > giveYear<- function(df) { return( as.numeric(gsubfn("^[^0-9]+", "", > names(df)[1])) ) } > df$year2<- sapply(x, giveYear) > > Thanks in advance! > > Cheers!! > Albert-Jan > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > All right, but apart from the sanitation, the medicine, education, wine, public > order, irrigation, roads, a fresh water system, and public health, what have the > Romans ever done for us? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From pdalgd at gmail.com Wed May 4 14:25:35 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Wed, 4 May 2011 14:25:35 +0200 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: Message-ID: On May 4, 2011, at 11:03 , JP wrote: > On 3 May 2011 20:50, peter dalgaard wrote: >> >> On Apr 28, 2011, at 15:18 , JP wrote: >> >>> >>> >>> I have found that when doing a wilcoxon signed ranked test you should report: >>> >>> - The median value (and not the mean or sd, presumably because of the >>> underlying potential non normal distribution) >>> - The Z score (or value) >>> - r >>> - p value >>> >> >> ...printed on 40g/m^2 acid free paper with a pencil of 3B softness? >> >> Seriously, with nonparametrics, the p value is the only thing of real interest, the other stuff is just attempting to check on authors doing their calculations properly. The median difference is of some interest, but it is not actually what is being tested, and in heavily tied data, it could even be zero with a highly significant p-value. The Z score can in principle be extracted from the p value (qnorm(p/2), basically) but it's obviously unstable in the extreme cases. What is r? The correlation? Pearson, not Spearman? >> > > Thanks for this Peter - a couple of more questions: > > a <- rnorm(500) > b <- runif(500, min=0, max=1) > x <- wilcox.test(a, b, alternative="two.sided", exact=T, paired=T) > x$statistic > > V > 31835 > > What is V? (is that the value Z of the test statistic)? No. It's the sum of the positive ranks: r <- rank(abs(x)) STATISTIC <- sum(r[x > 0]) names(STATISTIC) <- "V" (where x is actually x-y in the paired case) Subtract the expected value of V (sum(1:500)/2 == 62625) in your case, and divide by the standard deviation (sqrt(500*501*1001/24)=3232.327) and you get Z=-9.54. The slight discrepancy is likely due to your use of exact=T (so your p value is not actually computed from Z). > > z.score <- qnorm(x$p.value/2) > [1] -9.805352 > > But what does this zscore show in practice? That your test statistic is approx. 10 standard deviations away from its mean, if the null hypothesis were to be true. > > The d.f. are suggested to be reported here: > http://staff.bath.ac.uk/pssiw/stats2/page2/page3/page3.html > Some software replaces the asymptotic normal distribution of the rank sums with the t-distribution with the same df as would be used in an ordinary t test. However, since there is no such thing as an independent variance estimate in the Wilcoxon test, it is hard to see how that should be an improvement. I have it down to "coding by non-statistician". > And r is mentioned here > http://huberb.people.cofc.edu/Guide/Reporting_Statistics%20in%20Psychology.pdfs > > Aha, so it's supposed to be the effect size. On the referenced site they suggest to use r=Z/sqrt(N). (They even do so for the independent samples version, which looks wrong to me). > >>> My questions are: >>> >>> - Are the above enough/correct values to report (some places even >>> quote W and df) ? >> >> df is silly, and/or blatantly wrong... >> >>> What else would you suggest? >>> - How do I calculate the Z score and r for the above example? >>> - How do I get each statistic from the pairwise.wilcox.test call? >>> >>> Many Thanks >>> JP >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >> >> -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From fomcl at yahoo.com Wed May 4 14:28:33 2011 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Wed, 4 May 2011 05:28:33 -0700 (PDT) Subject: [R] first occurrence of a value? In-Reply-To: <4DC143D5.3040109@uky.edu> References: <688112.87808.qm@web110712.mail.gq1.yahoo.com> <4DC143D5.3040109@uky.edu> Message-ID: <728950.34752.qm@web110704.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bromideh at hotmail.com Wed May 4 12:01:36 2011 From: bromideh at hotmail.com (Ali Akbar Bromideh) Date: Wed, 4 May 2011 10:01:36 +0000 Subject: [R] How to fit a random data into Beta distribution? In-Reply-To: References: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com>, , , Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From simon.frey at boku.ac.at Wed May 4 11:00:22 2011 From: simon.frey at boku.ac.at (smoff) Date: Wed, 4 May 2011 02:00:22 -0700 (PDT) Subject: [R] Format ddmmYYYY in date Message-ID: <1304499622764-3494921.post@n4.nabble.com> Hello everybody, I'm quite new in using R so please do not kill me if I ask stupid questions. My problem is that I have a table containing dates in the first column of 10 years. These dates have the format ddmmYYYY at least in the csv-file. After importing the file using read.table() R deletes the first character if it is a zero. So e.g. if it's 01012010 R displays it as 1012010. Now of course I cannot change the format of this column into date using as.date or strptime or at least I don't no how, because R wants to have all entires to be the same lenght. So it converts only entries like 10102000 to 10.10.2000 but write NA for shorter entries. How do I solve this problem? Is there a way to tell R not to delete the first character even if it is a zero or to directly read the first column as date? Thank you, cheers, Simon -- View this message in context: http://r.789695.n4.nabble.com/Format-ddmmYYYY-in-date-tp3494921p3494921.html Sent from the R help mailing list archive at Nabble.com. From MatthiasNeff at gmx.ch Wed May 4 11:01:44 2011 From: MatthiasNeff at gmx.ch (tornanddesperate) Date: Wed, 4 May 2011 02:01:44 -0700 (PDT) Subject: [R] select value from a column depending on a value in another column Message-ID: <1304499704219-3494926.post@n4.nabble.com> Hi everybody I couldn't find the solution to what must be quite a simple problem. Maybe you can help? treatment session period stage wage_accepted market 1 1 1 1 1 25 public 2 1 1 1 1 19 privat 3 1 1 1 1 15 public 4 1 1 1 2 32 public 5 1 1 1 2 13 privat >From this table, I'd like to choose only those values in the column "wage_accepted" that have the value "public" in the column "market". How can I do this? Is there a good general help site for R that would explain basic table manipulations such as this? Thank you very much for your help -- View this message in context: http://r.789695.n4.nabble.com/select-value-from-a-column-depending-on-a-value-in-another-column-tp3494926p3494926.html Sent from the R help mailing list archive at Nabble.com. From jujokopr at jyu.fi Wed May 4 12:44:04 2011 From: jujokopr at jyu.fi (TheSavageSam) Date: Wed, 4 May 2011 03:44:04 -0700 (PDT) Subject: [R] Is this confict of different versions of R or something else? Message-ID: <1304505844374-3495104.post@n4.nabble.com> Hello! I have had some problems lately with use of R at home and school. At my home laptop (Ubuntu linux 64bit & R-2.12.0) R works just fine. But when I take my codes to school(Windows XP 32bit & R-2.10.1 I think) and run those there those codes probably won't work. Functions as combinations() didn't work and expand.grid() worked a bit differently. About expand.grid I just couldn't pass a dataframe of parametres but I had to pass all the colums separately. Is this because my university has so old version of R? Or is it because I am using Linux at home? Or is it because some libraries aren't installed? I like R very much, but if there is difference between R in different operating systems, then I dislike that. Can you give me some tips how to avoid these problems? Install latest R to my university PC? I don't want to fall back at Windows users -category anymore. -- TheSavageSam -- View this message in context: http://r.789695.n4.nabble.com/Is-this-confict-of-different-versions-of-R-or-something-else-tp3495104p3495104.html Sent from the R help mailing list archive at Nabble.com. From rostohar at gmail.com Wed May 4 12:46:52 2011 From: rostohar at gmail.com (KateR) Date: Wed, 4 May 2011 03:46:52 -0700 (PDT) Subject: [R] compute coefficient of determination (R-squared) for GLM (maximum likelihood) In-Reply-To: References: Message-ID: <1304506012489-3495108.post@n4.nabble.com> Dear mr Joris Meys I would like to know, where to find this formulas in books or articles. # possibility 1 R2 <- cor(y,predict(mod))^2 # possibility 2 R2 <- 1 - ( sum( (y-predict(mod))^2 ) / sum( (y-mean(y))^2 ) ) The first calculation seems OK, it gives the logicala values in models (from 0 to 1), but the second gives the negative values; higher corelation between the y and prediction gives more neagtive R2 value (up to -85). And my second question, looks logical but I need more teoretical answer; why R^2 (r-square) values are not appropriate for use with non-linear regression models (like exponential)? Thank you for your answers. Greetings, Kate -- View this message in context: http://r.789695.n4.nabble.com/compute-coefficient-of-determination-R-squared-for-GLM-maximum-likelihood-tp2261975p3495108.html Sent from the R help mailing list archive at Nabble.com. From maharwood at hotmail.com Wed May 4 14:18:22 2011 From: maharwood at hotmail.com (Mike Harwood) Date: Wed, 4 May 2011 07:18:22 -0500 Subject: [R] ID parameter in model In-Reply-To: <197e8ae6-a4f8-42c8-9318-8162a3cbae40@gu8g2000vbb.googlegroups.com> References: <197e8ae6-a4f8-42c8-9318-8162a3cbae40@gu8g2000vbb.googlegroups.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From halldor.bjornsson at gmail.com Wed May 4 14:52:03 2011 From: halldor.bjornsson at gmail.com (=?ISO-8859-1?Q?Halld=F3r_Bj=F6rnsson?=) Date: Wed, 4 May 2011 12:52:03 +0000 Subject: [R] bivariate linear interpolation Message-ID: Hi, I have three matrices (X,Y,P) with the same dimension. The X,Y grid is regular and I want to perform linear interpolation to pick out certain points. In matlab appropriate call is something like Pout=interp2(X,Y,P,Xout,Yout, method="linear") where Xout and Yout are the locations where I want the Pout data (typically a different grid). (Scipy has this routine in interpolate.interp2d, with similar arguments) In R there is (as often) the choice between many different interpolation routines. Akima has one for irregularly spaced data (and does not like co-linearity in the data). Fields has another one, with a more complicated arguments. What is the best R function that accomplishes this? Sincerely Halld?r From matevz.pavlic at gi-zrmk.si Wed May 4 14:55:31 2011 From: matevz.pavlic at gi-zrmk.si (=?iso-8859-2?Q?Matev=BE_Pavli=E8?=) Date: Wed, 4 May 2011 14:55:31 +0200 Subject: [R] scatter plot with Z value Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bbolker at gmail.com Wed May 4 14:57:55 2011 From: bbolker at gmail.com (Ben Bolker) Date: Wed, 4 May 2011 12:57:55 +0000 Subject: [R] Format ddmmYYYY in date References: <1304499622764-3494921.post@n4.nabble.com> Message-ID: smoff boku.ac.at> writes: > My problem is that I have a table containing dates in the first column of 10 > years. These dates have the format ddmmYYYY at least in the csv-file. After > importing the file using read.table() R deletes the first character if it is > a zero. [snip] > How do I solve this problem? Is there a way to tell R not to delete the > first character even if it is a zero or to directly read the first column as > date? See the "colClasses" argument of ?read.table ... (added a little bit of text to make gmane happy) From A.Robinson at ms.unimelb.edu.au Wed May 4 15:00:11 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 23:00:11 +1000 Subject: [R] select value from a column depending on a value in another column In-Reply-To: <1304499704219-3494926.post@n4.nabble.com> References: <1304499704219-3494926.post@n4.nabble.com> Message-ID: <20110504130011.GA827@ms.unimelb.edu.au> Try subset(). Andrew On Wed, May 04, 2011 at 02:01:44AM -0700, tornanddesperate wrote: > Hi everybody > > I couldn't find the solution to what must be quite a simple problem. Maybe > you can help? > > treatment session period stage wage_accepted market > 1 1 1 1 1 25 public > 2 1 1 1 1 19 privat > 3 1 1 1 1 15 public > 4 1 1 1 2 32 public > 5 1 1 1 2 13 privat > > >From this table, I'd like to choose only those values in the column > "wage_accepted" that have the value "public" in the column "market". How can > I do this? > > Is there a good general help site for R that would explain basic table > manipulations such as this? > > Thank you very much for your help > > > > -- > View this message in context: http://r.789695.n4.nabble.com/select-value-from-a-column-depending-on-a-value-in-another-column-tp3494926p3494926.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Wed May 4 15:02:57 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Wed, 4 May 2011 23:02:57 +1000 Subject: [R] what happens when I store linear models in an array? In-Reply-To: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> References: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> Message-ID: <20110504130257.GB827@ms.unimelb.edu.au> Hi Andrew, try fitted(lms.ASP[1,1][[1]]) Cheers Andrew On Wed, May 04, 2011 at 01:49:45PM +0200, Andrew D. Steen wrote: > I've got a bunch of similar datasets, all of which I've fit to linear > models. I'd like to easily create arrays of a specific parameter from each > linear model (e.g., all of the intercepts in one array). I figured I'd put > the model objects into an array, and then (somehow) I could easily create > corresponding arrays of intercepts or residuals or whatever, but I can't the > parameters back out. > > > > Right now I've stored the model objects in a 2-D array: > > lms.ASP <- array(list(), c(3,4)) > > > > Then I fill the array element-by-element: > > surf105.lm. ASP <- lm(ASP ~ time) > > lms.ASP[1,1] <- list(surf105.lm.ASP) > > > > Something is successfully being stored in the array: > > test <- lms.tx.ASP[1,1] > > test > [[1]] > Call: > lm(formula = ASP ~ time) > Coefficients: > (Intercept) elapsed.time..hr > 0.430732 0.004073 > > > > But I can't seem to call extraction functions on the linear models: > > fitted(lms.ASP[1,1]) > NULL > > > > It seems like something less than the actual linear model object is being > stored in the array, but I don't understand what's happening, or how to > easily batch-extract parameters of linear models. Any advice? > > > > > > ____________________________________ > > Andrew D. Steen, Ph.D. > > Center for Geomicrobiology, Aarhus University > > Ny Munkegade 114 > DK-8000 Aarhus C > Denmark > Tel: +45 8942 3241 > Fax: +45 8942 2722 > > andrew.steen at biology.au.dk > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From jeanpaul.ebejer at inhibox.com Wed May 4 15:11:15 2011 From: jeanpaul.ebejer at inhibox.com (JP) Date: Wed, 4 May 2011 14:11:15 +0100 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: Message-ID: Peter thanks for the fantastically simple and understandable explanation... To sum it up... to find the z values of a number of pairwise wilcox tests do the following: # pairwise tests with bonferroni correction x <- pairwise.wilcox.test(a, b, alternative="two.sided", p.adj="bonferroni", exact=F, paired=T) # what is the data structure we got back is.matrix(x$p.value) # p vals x$p.value # z.scores for each z.score <- qnorm(x$p.value / 2) On 4 May 2011 13:25, peter dalgaard wrote: > > On May 4, 2011, at 11:03 , JP wrote: > >> On 3 May 2011 20:50, peter dalgaard wrote: >>> >>> On Apr 28, 2011, at 15:18 , JP wrote: >>> >>>> >>>> >>>> I have found that when doing a wilcoxon signed ranked test you should report: >>>> >>>> - The median value (and not the mean or sd, presumably because of the >>>> underlying potential non normal distribution) >>>> - The Z score (or value) >>>> - r >>>> - p value >>>> >>> >>> ...printed on 40g/m^2 acid free paper with a pencil of 3B softness? >>> >>> Seriously, with nonparametrics, the p value is the only thing of real interest, the other stuff is just attempting to check on authors doing their calculations properly. The median difference is of some interest, but it is not actually what is being tested, and in heavily tied data, it could even be zero with a highly significant p-value. The Z score can in principle be extracted from the p value (qnorm(p/2), basically) but it's obviously unstable in the extreme cases. What is r? The correlation? Pearson, not Spearman? >>> >> >> Thanks for this Peter - a couple of more questions: >> >> a <- rnorm(500) >> b <- runif(500, min=0, max=1) >> x <- wilcox.test(a, b, alternative="two.sided", exact=T, paired=T) >> x$statistic >> >> ? ?V >> 31835 >> >> What is V? (is that the value Z of the test statistic)? > > No. It's the sum of the positive ranks: > > ? ? ? ?r <- rank(abs(x)) > ? ? ? ?STATISTIC <- sum(r[x > 0]) > ? ? ? ?names(STATISTIC) <- "V" > > (where x is actually x-y in the paired case) > > Subtract the expected value of V (sum(1:500)/2 == 62625) in your case, and divide by the standard deviation (sqrt(500*501*1001/24)=3232.327) and you get Z=-9.54. The slight discrepancy is likely due to your use of exact=T (so your p value is not actually computed from Z). > > >> >> z.score <- qnorm(x$p.value/2) >> [1] -9.805352 >> >> But what does this zscore show in practice? > > > That your test statistic is approx. 10 standard deviations away from its mean, if the null hypothesis were to be true. > > >> >> The d.f. are suggested to be reported here: >> http://staff.bath.ac.uk/pssiw/stats2/page2/page3/page3.html >> > > Some software replaces the asymptotic normal distribution of the rank sums with the t-distribution with the same df as would be used in an ordinary t test. However, since there is no such thing as an independent variance estimate in the Wilcoxon test, it is hard to see how that should be an improvement. I have it down to "coding by non-statistician". > > >> And r is mentioned here >> http://huberb.people.cofc.edu/Guide/Reporting_Statistics%20in%20Psychology.pdfs >> >> > > Aha, so it's supposed to be the effect size. On the referenced site they suggest to use r=Z/sqrt(N). (They even do so for the independent samples version, which looks wrong to me). > >> >>>> My questions are: >>>> >>>> - Are the above enough/correct values to report (some places even >>>> quote W and df) ? >>> >>> df is silly, and/or blatantly wrong... >>> >>>> ?What else would you suggest? >>>> - How do I calculate the Z score and r for the above example? >>>> - How do I get each statistic from the pairwise.wilcox.test call? >>>> >>>> Many Thanks >>>> JP >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> -- >>> Peter Dalgaard >>> Center for Statistics, Copenhagen Business School >>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>> Phone: (+45)38153501 >>> Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com >>> >>> > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com > > > From bt_jannis at yahoo.de Wed May 4 15:14:09 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 4 May 2011 14:14:09 +0100 (BST) Subject: [R] scatter plot with Z value In-Reply-To: Message-ID: <572593.28631.qm@web28211.mail.ukl.yahoo.com> I am sure you would have found answers to your questions if you would have searched the mailing list archive (http://r.789695.n4.nabble.com/R-help-f789696.html)! To get you started, have a look at: ?text (for the z values) ?abline (for the line) Jannis --- Matev? Pavli? schrieb am Mi, 4.5.2011: > Von: Matev? Pavli? > Betreff: [R] scatter plot with Z value > An: r-help at r-project.org > Datum: Mittwoch, 4. Mai, 2011 12:55 Uhr > Hi all, > > > > I would like to create a scatter plot of two variables (y, > x)? whith third value (z) written on the plot? After > that i would like to add a line (Y=0.7*(x-20)) to the graph. > > > > > I tried > > plot(x~y) > > > > but there is no command for the third vairable to be shown > on the graph > > also i can't find a way to add a Y=x*(0.7-20) to the > chart. > > > > Thanks, m > > > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > From bt_jannis at yahoo.de Wed May 4 15:17:13 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 4 May 2011 14:17:13 +0100 (BST) Subject: [R] Problems saving ff objects Message-ID: <86317.99948.qm@web28204.mail.ukl.yahoo.com> Dear list, I am trying to understand and use the ff package. As I had some problems saving some ff objects, and as I did not fully manage to understand the whole concept of *.ff, *.ffData and *.RData with the help of the documentation, I tried to reproduce the examples from the help of ffsave. When I ran, however : (copied from the help) message("let's create some ff objects") n <- 8e3 a <- ff(sample(n, n, TRUE), vmode="integer", length=n, filename="d:/tmp/a.ff") b <- ff(sample(255, n, TRUE), vmode="ubyte", length=n, filename="d:/tmp/b.ff") x <- ff(sample(255, n, TRUE), vmode="ubyte", length=n, filename="d:/tmp/x.ff") y <- ff(sample(255, n, TRUE), vmode="ubyte", length=n, filename="d:/tmp/y.ff") z <- ff(sample(255, n, TRUE), vmode="ubyte", length=n, filename="d:/tmp/z.ff") df <- ffdf(x=x, y=y, z=z) rm(x,y,z) message("save all of them") ffsave.image("d:/tmp/x") I get: Error in ffsave(list = ls(envir = .GlobalEnv, all.names = TRUE), file = outfile, : the previous files do not match the rootpath (case sensitive) Whats wrong here? Should this not be working as I did not change anything in the code? Cheers Jannis > sessionInfo() R version 2.12.0 (2010-10-15) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] ff_2.2-2 bit_1.1-7 rj_0.5.2-1 loaded via a namespace (and not attached): [1] rJava_0.8-8 From rbaer at atsu.edu Wed May 4 15:28:50 2011 From: rbaer at atsu.edu (Robert Baer) Date: Wed, 4 May 2011 08:28:50 -0500 Subject: [R] Watts Strogatz game In-Reply-To: <20110504002020.GT48756@ms.unimelb.edu.au> References: <1304408575725-3491922.post@n4.nabble.com> <20110504002020.GT48756@ms.unimelb.edu.au> Message-ID: <7E09AB89357C4B83813921C9C8FCC8A0@kcom.edu> >> I have a erdos-renyi game with 6000 nodes and probability 0.003. >> >> g1 = erdos.renyi.game(6000, 0.003) >> >> How to create a Watts Strogatz game with the same probability. >> >> g1 = watts.strogatz.game(1, 6000, ?, ?) >> What should be the third and fourth parameter to this argument. According to ?watts.strogatz.game help file (in the igraph package?), the four arguments to this function are: dim Integer constant, the dimension of the starting lattice. size Integer constant, the size of the lattice along each dimension. nei Integer constant, the neighborhood within which the vertices of the lattice will be connected. p Real constant between zero and one, the rewiring probability. So it looks like the last two should be neighborhood and rewiring probability respectively. From bt_jannis at yahoo.de Wed May 4 15:31:51 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 4 May 2011 14:31:51 +0100 (BST) Subject: [R] Problems saving ff objects In-Reply-To: <86317.99948.qm@web28204.mail.ukl.yahoo.com> Message-ID: <968391.72417.qm@web28208.mail.ukl.yahoo.com> Just did some more testing.....May the problem be due to the fact that I am using a windows machine? I just ran the same code on a Linux machine and everything worked fine. If windows (or the file system of the disk) caused the problem, is there any way to resolve it? I know that using Linux would be a better choice ;-) but unfortunatley this in no option at the moment.... Best Jannis --- Jannis schrieb am Mi, 4.5.2011: > Von: Jannis > Betreff: [R] Problems saving ff objects > An: r-help at r-project.org > Datum: Mittwoch, 4. Mai, 2011 13:17 Uhr > Dear list, > > > I am trying to understand and use the ff package. As I had > some problems saving some ff objects, and as I did not fully > manage to understand the whole concept of *.ff, *.ffData and > *.RData with the help of the documentation, I tried to > reproduce the examples from the help of ffsave. > > When I ran, however : (copied from the help) > > message("let's create some ff objects") > ? n <- 8e3 > ? a <- ff(sample(n, n, TRUE), vmode="integer", > length=n, filename="d:/tmp/a.ff") > ? b <- ff(sample(255, n, TRUE), vmode="ubyte", > length=n, filename="d:/tmp/b.ff") > ? x <- ff(sample(255, n, TRUE), vmode="ubyte", > length=n, filename="d:/tmp/x.ff") > ? y <- ff(sample(255, n, TRUE), vmode="ubyte", > length=n, filename="d:/tmp/y.ff") > ? z <- ff(sample(255, n, TRUE), vmode="ubyte", > length=n, filename="d:/tmp/z.ff") > ? df <- ffdf(x=x, y=y, z=z) > ? rm(x,y,z) > > ? message("save all of them") > ? ffsave.image("d:/tmp/x") > > I get: > > Error in ffsave(list = ls(envir = .GlobalEnv, all.names = > TRUE), file = outfile,? : > ? the previous files do not match the rootpath (case > sensitive) > > > Whats wrong here? Should this not be working as I did not > change anything in the code? > > > > Cheers > Jannis > > > > sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252??? > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C? ? ? ? ? ? > ? ? ? ? ? ? ? > [5] LC_TIME=English_United States.1252? ? > > attached base packages: > [1] tools? ???stats? > ???graphics? grDevices utils? > ???datasets? methods? > [8] base? ??? > > other attached packages: > [1] ff_2.2-2???bit_1.1-7? rj_0.5.2-1 > > loaded via a namespace (and not attached): > [1] rJava_0.8-8 > > > > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > From biomathjdaily at gmail.com Wed May 4 16:03:48 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Wed, 4 May 2011 10:03:48 -0400 Subject: [R] what happens when I store linear models in an array? In-Reply-To: <20110504130257.GB827@ms.unimelb.edu.au> References: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> <20110504130257.GB827@ms.unimelb.edu.au> Message-ID: It looks like your call: lms.ASP[1,1] <- list(surf105.lm.ASP) makes a list of length 1 containing the lm object and puts that list into element [1,1] of your array. That is why you will need the extra indexing call of [[1]] Andrew Robinson suggested. On Wed, May 4, 2011 at 9:02 AM, Andrew Robinson wrote: > Hi Andrew, > > try > > fitted(lms.ASP[1,1][[1]]) > > Cheers > > Andrew > > On Wed, May 04, 2011 at 01:49:45PM +0200, Andrew D. Steen wrote: >> I've got a bunch of similar datasets, all of which I've fit to linear >> models. ?I'd like to easily create arrays of a specific parameter from each >> linear model (e.g., all of the intercepts in one array). ?I figured I'd put >> the model objects into an array, and then (somehow) I could easily create >> corresponding arrays of intercepts or residuals or whatever, but I can't the >> parameters back out. >> >> >> >> Right now I've stored the model objects in a 2-D array: >> > lms.ASP <- array(list(), c(3,4)) >> >> >> >> Then I fill the array element-by-element: >> > surf105.lm. ASP <- lm(ASP ~ time) >> > lms.ASP[1,1] <- list(surf105.lm.ASP) >> >> >> >> Something is successfully being stored in the array: >> > test <- lms.tx.ASP[1,1] >> > test >> [[1]] >> Call: >> lm(formula = ASP ~ time) >> Coefficients: >> ? ? ?(Intercept) ?elapsed.time..hr >> ? ? ?0.430732 ? ? ? ? ?0.004073 >> >> >> >> But I can't seem to call extraction functions on the linear models: >> > fitted(lms.ASP[1,1]) >> NULL >> >> >> >> It seems like something less than the actual linear model object is being >> stored in the array, but I don't understand what's happening, or how to >> easily batch-extract parameters of linear models. ?Any advice? >> >> >> >> >> >> ____________________________________ >> >> Andrew D. Steen, Ph.D. >> >> Center for Geomicrobiology, Aarhus University >> >> Ny Munkegade 114 >> DK-8000 Aarhus C >> Denmark >> Tel: +45 8942 3241 >> Fax: +45 8942 2722 >> >> andrew.steen at biology.au.dk >> >> >> >> >> ? ? ? [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Program Manager, ACERA > Department of Mathematics and Statistics ? ? ? ? ? ?Tel: +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia ? ? ? ? ? ? ? (prefer email) > http://www.ms.unimelb.edu.au/~andrewpr ? ? ? ? ? ? ?Fax: +61-3-8344-4599 > http://www.acera.unimelb.edu.au/ > > Forest Analytics with R (Springer, 2011) > http://www.ms.unimelb.edu.au/FAwR/ > Introduction to Scientific Programming and Simulation using R (CRC, 2009): > http://www.ms.unimelb.edu.au/spuRs/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From scttchamberlain4 at gmail.com Wed May 4 14:51:22 2011 From: scttchamberlain4 at gmail.com (Scott Chamberlain) Date: Wed, 4 May 2011 07:51:22 -0500 Subject: [R] select value from a column depending on a value in another column In-Reply-To: <1304499704219-3494926.post@n4.nabble.com> References: <1304499704219-3494926.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From janine.halder at unil.ch Wed May 4 15:37:43 2011 From: janine.halder at unil.ch (Janhal) Date: Wed, 4 May 2011 06:37:43 -0700 (PDT) Subject: [R] Superscript number before letter Message-ID: <1304516263466-3495577.post@n4.nabble.com> Salut, I have been struggling to superscript the 18 before the O without ^ visible and found only help to superscript numbers after the letter. Thanks to anyone who can help. xlab=expression(delta*18O VSMOW [?]") Cheers, Janine -- View this message in context: http://r.789695.n4.nabble.com/Superscript-number-before-letter-tp3495577p3495577.html Sent from the R help mailing list archive at Nabble.com. From clanders at utmb.edu Wed May 4 15:42:51 2011 From: clanders at utmb.edu (algorimancer) Date: Wed, 4 May 2011 06:42:51 -0700 (PDT) Subject: [R] Unexp. behavior from boot with multiple statistics In-Reply-To: <20110503231154.GP48756@ms.unimelb.edu.au> References: <1304450105554-3493300.post@n4.nabble.com> <20110503231154.GP48756@ms.unimelb.edu.au> Message-ID: <1304516571892-3495590.post@n4.nabble.com> Thanks, that clears things up quite a bit. Now I'm left wondering why there is so much bias, but that's a separate issue. -- View this message in context: http://r.789695.n4.nabble.com/Unexp-behavior-from-boot-with-multiple-statistics-tp3493300p3495590.html Sent from the R help mailing list archive at Nabble.com. From reith_william at bah.com Wed May 4 15:55:07 2011 From: reith_william at bah.com (wwreith) Date: Wed, 4 May 2011 06:55:07 -0700 (PDT) Subject: [R] Storing data from a test as a vector or matrix Message-ID: <1304517307398-3495626.post@n4.nabble.com> I just finished a MANOVA test and got the following output: > summary(M, test="Pillai") Df Pillai approx F num Df den Df Pr(>F) as.factor(X) 3 1.1922 6.5948 36 360 < 2.2e-16 *** Residuals 129 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 I would like to store the values Df=3, Pillai=1.1922, P-value, etc. as a vector. I have tried the following code that did not work: > S=summary(M, test="Pillai") >S1<-as.vector(S$Df, S$Pillai) but I am getting an error every time. I have also tried just S$Df. Is there a way to find out if "Df" and "Pillai" are correct headers to reference. For example maybe it is "df" or "degreesfreedom" etc. I know the concept works because I have used it for logistic regression. >v1 <- as.vector(exp(L1$coefficients)) The difference is that I know "coefficients" is the correct header to refer to. -- View this message in context: http://r.789695.n4.nabble.com/Storing-data-from-a-test-as-a-vector-or-matrix-tp3495626p3495626.html Sent from the R help mailing list archive at Nabble.com. From sterlesser at hotmail.com Wed May 4 16:04:36 2011 From: sterlesser at hotmail.com (sterlesser) Date: Wed, 4 May 2011 07:04:36 -0700 (PDT) Subject: [R] nls problem with R In-Reply-To: <5CD78996B8F8844D963C875D3159B94A02354D39@dsrcorreo> References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> <5CD78996B8F8844D963C875D3159B94A02354D39@dsrcorreo> Message-ID: <1304517876886-3495663.post@n4.nabble.com> Thanks Ruben. Your suggestion about more deeper analysis about the model itself is really helpful. I am trying out some new initial values based on the analysis of the special T2 in the model. -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495663.html Sent from the R help mailing list archive at Nabble.com. From xianzhang at gmail.com Wed May 4 15:55:44 2011 From: xianzhang at gmail.com (Xian Zhang) Date: Wed, 4 May 2011 15:55:44 +0200 Subject: [R] create a folder with mode '0777' Message-ID: Dear list, I am trying to create a folder structure, say 'test/sub', and set the folder and sub folder to be writable to everyone. By default dir.create('test/sub', recursive=TRUE, mode='0777') creates folders with mode: drwxr-xr-x After Sys.chmod('test/sub',mode='0777') The folder 'test' is: drwxr-xr-x and the sub folder 'sub' is: drwxrwxrwx The question is how to generate a folder and sub folders, with every folder being drwxrwxrwx ? I am using a linux/redhat system. Thank you for your help. Xian From sterlesser at hotmail.com Wed May 4 16:07:44 2011 From: sterlesser at hotmail.com (sterlesser) Date: Wed, 4 May 2011 07:07:44 -0700 (PDT) Subject: [R] nls problem with R In-Reply-To: <20110504071506.GU48756@ms.unimelb.edu.au> References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> Message-ID: <1304518064519-3495672.post@n4.nabble.com> Thanks Andrew. I am sorry for some typos that I omit some numbers of T2. Based on your suggestion,I think the problem is in the initial values. And I will read more theory about the non-linear regression. -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html Sent from the R help mailing list archive at Nabble.com. From goran.brostrom at gmail.com Wed May 4 16:21:38 2011 From: goran.brostrom at gmail.com (=?UTF-8?B?R8O2cmFuIEJyb3N0csO2bQ==?=) Date: Wed, 4 May 2011 16:21:38 +0200 Subject: [R] ID parameter in model In-Reply-To: References: <197e8ae6-a4f8-42c8-9318-8162a3cbae40@gu8g2000vbb.googlegroups.com> Message-ID: On Wed, May 4, 2011 at 2:18 PM, Mike Harwood wrote: > Thank you, Goran. ?Please see the package details below: Thanks, I have uploaded a corrected version of eha to CRAN. Should be available soon. G?ran [...] From csardi at rmki.kfki.hu Wed May 4 16:25:47 2011 From: csardi at rmki.kfki.hu (=?ISO-8859-1?B?R+Fib3IgQ3PhcmRp?=) Date: Wed, 4 May 2011 10:25:47 -0400 Subject: [R] Watts Strogatz game In-Reply-To: <7E09AB89357C4B83813921C9C8FCC8A0@kcom.edu> References: <1304408575725-3491922.post@n4.nabble.com> <20110504002020.GT48756@ms.unimelb.edu.au> <7E09AB89357C4B83813921C9C8FCC8A0@kcom.edu> Message-ID: On Wed, May 4, 2011 at 9:28 AM, Robert Baer wrote: >>> I have a erdos-renyi game with 6000 nodes and probability 0.003. >>> >>> g1 = erdos.renyi.game(6000, 0.003) >>> >>> How to create a Watts Strogatz game with the same probability. >>> >>> g1 = watts.strogatz.game(1, 6000, ?, ?) >>> What should be the third and fourth parameter to this argument. You can work out the number of edges in a Watts-Strogatz game easily, by calculating the degree of the nodes in the non-randomized network. This will be different for different dimensions, of course. Randomization does not change the average degree. Obviously, you cannot exactly match all Erdos-Renyi graphs, because the W-S density cannot change continuously. Gabor > According to ?watts.strogatz.game help file (in the igraph package?), the > four arguments to this function are: > dim ? ? Integer constant, the dimension of the starting lattice. > size ? ? Integer constant, the size of the lattice along each dimension. > nei ? ? Integer constant, the neighborhood within which the vertices of the > lattice will be connected. > p ? ? ? ? Real constant between zero and one, the rewiring probability. > > So it looks like the last two should be neighborhood and rewiring > probability respectively. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Gabor Csardi ? ?? MTA KFKI RMKI From rvaradhan at jhmi.edu Wed May 4 16:25:53 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Wed, 4 May 2011 10:25:53 -0400 Subject: [R] nls problem with R In-Reply-To: <1304518064519-3495672.post@n4.nabble.com> References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> <1304518064519-3495672.post@n4.nabble.com> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669FAA4C8@WIGGUMVS.win.ad.jhu.edu> In addition to the suggestion about finding a good initial value, you should also scale your response V2 (and, of course, V0). Divide V2 by 10^4, for example. Now your V0 should also be scaled by this factor. This would likely help with convergence. Ravi. ------------------------------------------------------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvaradhan at jhmi.edu -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of sterlesser Sent: Wednesday, May 04, 2011 10:08 AM To: r-help at r-project.org Subject: Re: [R] nls problem with R Thanks Andrew. I am sorry for some typos that I omit some numbers of T2. Based on your suggestion,I think the problem is in the initial values. And I will read more theory about the non-linear regression. -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From rmh at temple.edu Wed May 4 16:28:19 2011 From: rmh at temple.edu (Richard M. Heiberger) Date: Wed, 4 May 2011 10:28:19 -0400 Subject: [R] Superscript number before letter In-Reply-To: <1304516263466-3495577.post@n4.nabble.com> References: <1304516263466-3495577.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From petr.pikal at precheza.cz Wed May 4 16:30:37 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Wed, 4 May 2011 16:30:37 +0200 Subject: [R] Odp: Is this confict of different versions of R or something else? In-Reply-To: <1304505844374-3495104.post@n4.nabble.com> References: <1304505844374-3495104.post@n4.nabble.com> Message-ID: Hi r-help-bounces at r-project.org napsal dne 04.05.2011 12:44:04: > TheSavageSam > Odeslal: r-help-bounces at r-project.org > > > Hello! > > I have had some problems lately with use of R at home and school. At my home > laptop (Ubuntu linux 64bit & R-2.12.0) R works just fine. But when I take my > codes to school(Windows XP 32bit & R-2.10.1 I think) and run those there > those codes probably won't work. Functions as combinations() didn't work and > expand.grid() worked a bit differently. About expand.grid I just couldn't > pass a dataframe of parametres but I had to pass all the colums separately. > > Is this because my university has so old version of R? Or is it because I am > using Linux at home? Or is it because some libraries aren't installed? > I like R very much, but if there is difference between R in different > operating systems, then I dislike that. Hm. Untill now I thought that different versions are for introducing new features, properties or enhanced computing. You can not expect that everything what works in new version will work in any older version. There can be slight issues with different operation systems, however I believe the biggest problem is old R version in your school PC. Regards Petr > > Can you give me some tips how to avoid these problems? Install latest R to > my university PC? I don't want to fall back at Windows users -category > anymore. > > -- > TheSavageSam > > -- > View this message in context: http://r.789695.n4.nabble.com/Is-this- > confict-of-different-versions-of-R-or-something-else-tp3495104p3495104.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From pdalgd at gmail.com Wed May 4 16:32:42 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Wed, 4 May 2011 16:32:42 +0200 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: Message-ID: <8620F7B8-6856-4E48-A8C8-5D95D210AFDF@gmail.com> On May 4, 2011, at 15:11 , JP wrote: > Peter thanks for the fantastically simple and understandable explanation... > > To sum it up... to find the z values of a number of pairwise wilcox > tests do the following: > > # pairwise tests with bonferroni correction > x <- pairwise.wilcox.test(a, b, alternative="two.sided", > p.adj="bonferroni", exact=F, paired=T) You probably don't want the bonferroni correction there. Rather p.adj="none". You generally correct the p values for multiple testing, not the test statistics. (My sentiment would be to pick apart the stats:::wilcox.test.default function and clone the computation of Z from it, but presumably backtracking from the p value is a useful expedient.) > # what is the data structure we got back > is.matrix(x$p.value) > # p vals > x$p.value > # z.scores for each > z.score <- qnorm(x$p.value / 2) > Hmm, you're not actually getting a signed z out of this, you might want to try alternative="greater" and drop the division by 2 inside qnorm(). (If the signs come out inverted, I meant "less" not "greater"...) > > > On 4 May 2011 13:25, peter dalgaard wrote: >> >> On May 4, 2011, at 11:03 , JP wrote: >> >>> On 3 May 2011 20:50, peter dalgaard wrote: >>>> >>>> On Apr 28, 2011, at 15:18 , JP wrote: >>>> >>>>> >>>>> >>>>> I have found that when doing a wilcoxon signed ranked test you should report: >>>>> >>>>> - The median value (and not the mean or sd, presumably because of the >>>>> underlying potential non normal distribution) >>>>> - The Z score (or value) >>>>> - r >>>>> - p value >>>>> >>>> >>>> ...printed on 40g/m^2 acid free paper with a pencil of 3B softness? >>>> >>>> Seriously, with nonparametrics, the p value is the only thing of real interest, the other stuff is just attempting to check on authors doing their calculations properly. The median difference is of some interest, but it is not actually what is being tested, and in heavily tied data, it could even be zero with a highly significant p-value. The Z score can in principle be extracted from the p value (qnorm(p/2), basically) but it's obviously unstable in the extreme cases. What is r? The correlation? Pearson, not Spearman? >>>> >>> >>> Thanks for this Peter - a couple of more questions: >>> >>> a <- rnorm(500) >>> b <- runif(500, min=0, max=1) >>> x <- wilcox.test(a, b, alternative="two.sided", exact=T, paired=T) >>> x$statistic >>> >>> V >>> 31835 >>> >>> What is V? (is that the value Z of the test statistic)? >> >> No. It's the sum of the positive ranks: >> >> r <- rank(abs(x)) >> STATISTIC <- sum(r[x > 0]) >> names(STATISTIC) <- "V" >> >> (where x is actually x-y in the paired case) >> >> Subtract the expected value of V (sum(1:500)/2 == 62625) in your case, and divide by the standard deviation (sqrt(500*501*1001/24)=3232.327) and you get Z=-9.54. The slight discrepancy is likely due to your use of exact=T (so your p value is not actually computed from Z). >> >> >>> >>> z.score <- qnorm(x$p.value/2) >>> [1] -9.805352 >>> >>> But what does this zscore show in practice? >> >> >> That your test statistic is approx. 10 standard deviations away from its mean, if the null hypothesis were to be true. >> >> >>> >>> The d.f. are suggested to be reported here: >>> http://staff.bath.ac.uk/pssiw/stats2/page2/page3/page3.html >>> >> >> Some software replaces the asymptotic normal distribution of the rank sums with the t-distribution with the same df as would be used in an ordinary t test. However, since there is no such thing as an independent variance estimate in the Wilcoxon test, it is hard to see how that should be an improvement. I have it down to "coding by non-statistician". >> >> >>> And r is mentioned here >>> http://huberb.people.cofc.edu/Guide/Reporting_Statistics%20in%20Psychology.pdfs >>> >>> >> >> Aha, so it's supposed to be the effect size. On the referenced site they suggest to use r=Z/sqrt(N). (They even do so for the independent samples version, which looks wrong to me). >> >>> >>>>> My questions are: >>>>> >>>>> - Are the above enough/correct values to report (some places even >>>>> quote W and df) ? >>>> >>>> df is silly, and/or blatantly wrong... >>>> >>>>> What else would you suggest? >>>>> - How do I calculate the Z score and r for the above example? >>>>> - How do I get each statistic from the pairwise.wilcox.test call? >>>>> >>>>> Many Thanks >>>>> JP >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> -- >>>> Peter Dalgaard >>>> Center for Statistics, Copenhagen Business School >>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >>>> Phone: (+45)38153501 >>>> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >>>> >>>> >> >> -- >> Peter Dalgaard >> Center for Statistics, Copenhagen Business School >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark >> Phone: (+45)38153501 >> Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com >> >> >> -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From patrick.breheny at uky.edu Wed May 4 16:38:14 2011 From: patrick.breheny at uky.edu (Patrick Breheny) Date: Wed, 4 May 2011 10:38:14 -0400 Subject: [R] create a folder with mode '0777' In-Reply-To: References: Message-ID: <4DC164D6.50404@uky.edu> Linux systems have a user mask that limits the file mode creation possibilities of any processes launched from that shell. If you check your /etc/profile file, you will see the line umask 022 This prevents you by default from creating files with write access for everyone except the user. In other words, this is a linux issue, not an R issue -- the same thing happens when you use mkdir. This can be overridden, however. For example, system("chmod -R 0777 test") which recursively changes the mode of test and all its subdirectories from within R. _______________________ Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky On 05/04/2011 09:55 AM, Xian Zhang wrote: > Dear list, > > I am trying to create a folder structure, say 'test/sub', and set the > folder and sub folder to be writable to everyone. > > By default > > dir.create('test/sub', recursive=TRUE, mode='0777') > > creates folders with mode: drwxr-xr-x > > After > > Sys.chmod('test/sub',mode='0777') > > The folder 'test' is: drwxr-xr-x > and the sub folder 'sub' is: drwxrwxrwx > > The question is how to generate a folder and sub folders, with every > folder being drwxrwxrwx ? > > I am using a linux/redhat system. > > Thank you for your help. > Xian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Wed May 4 16:28:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 07:28:07 -0700 Subject: [R] Format ddmmYYYY in date In-Reply-To: References: <1304499622764-3494921.post@n4.nabble.com> Message-ID: <7A0F8449-B641-4050-8E8F-FB218F663DAD@comcast.net> On May 4, 2011, at 5:57 AM, Ben Bolker wrote: > smoff boku.ac.at> writes: > >> My problem is that I have a table containing dates in the first >> column of 10 >> years. These dates have the format ddmmYYYY at least in the csv- >> file. After >> importing the file using read.table() R deletes the first character >> if it is >> a zero. > > [snip] > >> How do I solve this problem? Is there a way to tell R not to delete >> the >> first character even if it is a zero or to directly read the first >> column as >> date? > > See the "colClasses" argument of ?read.table ... > > (added a little bit of text to make gmane happy) I've had similar problems and this was my first strategy: > test <- c('1241949', '5182001','12252009') > ifelse(nchar(test)==7, paste("0", test, sep=""), test) [1] "01241949" "05182001" "12252009" I then used colClasses, and later simply asked to have all dates in the output format from the database changed to "YYYY-mm-dd". > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT From dwinsemius at comcast.net Wed May 4 16:34:47 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 07:34:47 -0700 Subject: [R] what happens when I store linear models in an array? In-Reply-To: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> References: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> Message-ID: <69089E95-E117-4C0E-92FB-3619677DB338@comcast.net> On May 4, 2011, at 4:49 AM, Andrew D. Steen wrote: > I've got a bunch of similar datasets, all of which I've fit to linear > models. I'd like to easily create arrays of a specific parameter > from each > linear model (e.g., all of the intercepts in one array). I figured > I'd put > the model objects into an array, and then (somehow) I could easily > create > corresponding arrays of intercepts or residuals or whatever, but I > can't the > parameters back out. > > Right now I've stored the model objects in a 2-D array: >> lms.ASP <- array(list(), c(3,4)) > > Then I fill the array element-by-element: >> surf105.lm. ASP <- lm(ASP ~ time) >> lms.ASP[1,1] <- list(surf105.lm.ASP) > > Something is successfully being stored in the array: >> test <- lms.tx.ASP[1,1] >> test > [[1]] > Call: > lm(formula = ASP ~ time) > Coefficients: > (Intercept) elapsed.time..hr > 0.430732 0.004073 > > But I can't seem to call extraction functions on the linear models: >> fitted(lms.ASP[1,1]) > NUL > > It seems like something less than the actual linear model object is > being > stored in the array, but I don't understand what's happening, or how > to > easily batch-extract parameters of linear models. Any advice? > The problem is that the "[" function is returning a sublist from that array of lists, which is still a list. You wanted the contents of the first (and only) element of that list and Andrew Robinson offered you the solution. -- David Winsemius, MD Heritage Laboratories West Hartford, CT From dwinsemius at comcast.net Wed May 4 16:39:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 07:39:07 -0700 Subject: [R] Format ddmmYYYY in date In-Reply-To: References: <1304499622764-3494921.post@n4.nabble.com> Message-ID: <77B8AFAA-144E-428B-A790-DFA4DFD7092C@comcast.net> On May 4, 2011, at 5:57 AM, Ben Bolker wrote: > smoff boku.ac.at> writes: > >> My problem is that I have a table containing dates in the first >> column of 10 >> years. These dates have the format ddmmYYYY at least in the csv- >> file. After >> importing the file using read.table() R deletes the first character >> if it is >> a zero. > > [snip] > >> How do I solve this problem? Is there a way to tell R not to delete >> the >> first character even if it is a zero or to directly read the first >> column as >> date? > > See the "colClasses" argument of ?read.table ... > > (added a little bit of text to make gmane happy) I've had similar problems and this was my first strategy: > test <- c('1241949', '5182001','12252009') > ifelse(nchar(test)==7, paste("0", test, sep=""), test) [1] "01241949" "05182001" "12252009" I then used colClasses, and later simply asked to have all dates in the output format from the database changed to "YYYY-mm-dd". > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT From ivan.calandra at uni-hamburg.de Wed May 4 16:52:45 2011 From: ivan.calandra at uni-hamburg.de (Ivan Calandra) Date: Wed, 04 May 2011 16:52:45 +0200 Subject: [R] Storing data from a test as a vector or matrix In-Reply-To: <1304517307398-3495626.post@n4.nabble.com> References: <1304517307398-3495626.post@n4.nabble.com> Message-ID: <4DC1683D.2030209@uni-hamburg.de> Hi, I would suggest you to check the structure of your summary object with str(), like this: S <- summary(M, test="Pillai") str(S) You will then see how to access each element of it. If you cannot manage to do it yourself, then provide an example, or at least the output from str(s). By the way, when you get an error, then copy it so that we have a better idea of what happens. HTH, Ivan Le 5/4/2011 15:55, wwreith a ?crit : > I just finished a MANOVA test and got the following output: > >> summary(M, test="Pillai") > Df Pillai approx F num Df den Df Pr(>F) > as.factor(X) 3 1.1922 6.5948 36 360< 2.2e-16 *** > Residuals 129 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > I would like to store the values Df=3, Pillai=1.1922, P-value, etc. as a > vector. > > I have tried the following code that did not work: > >> S=summary(M, test="Pillai") >> S1<-as.vector(S$Df, S$Pillai) > but I am getting an error every time. I have also tried just S$Df. Is there > a way to find out if "Df" and "Pillai" are correct headers to reference. For > example maybe it is "df" or "degreesfreedom" etc. > > I know the concept works because I have used it for logistic regression. > > >v1<- as.vector(exp(L1$coefficients)) > > The difference is that I know "coefficients" is the correct header to refer > to. > > -- > View this message in context: http://r.789695.n4.nabble.com/Storing-data-from-a-test-as-a-vector-or-matrix-tp3495626p3495626.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php From dwinsemius at comcast.net Wed May 4 17:01:08 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 08:01:08 -0700 Subject: [R] Superscript number before letter In-Reply-To: References: <1304516263466-3495577.post@n4.nabble.com> Message-ID: <036FAF88-AB8D-407C-9CA4-D5FF8F399308@comcast.net> On May 4, 2011, at 7:28 AM, Richard M. Heiberger wrote: > Dos this do what you want? > > plot(1:10, xlab=expression(delta*{}^18*"O" * " VSMOW [?]")) > > The specific is to put an empty item there to hold the superscript. I do not think that is necessary: plot(1:10, xlab=expression(delta^18*O~VSMOW["?"])) (I wasn't sure if the [.] operation was supposed to be a subscript.) I generally avoid spaces in plotmath expressions and use tildes "~" to accomplish any needed spaces. The asterisk "*" is the non-spacing separator between elements. > > On Wed, May 4, 2011 at 9:37 AM, Janhal wrote: > >> Salut, >> I have been struggling to superscript the 18 before the O without ^ >> visible >> and found only help to superscript numbers after the letter. Thanks >> to >> anyone who can help. >> xlab=expression(delta*18O VSMOW [?]") >> Cheers, >> Janine >> >> -- David Winsemius, MD Heritage Laboratories West Hartford, CT From dwinsemius at comcast.net Wed May 4 17:06:02 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 08:06:02 -0700 Subject: [R] Is this confict of different versions of R or something else? In-Reply-To: <1304505844374-3495104.post@n4.nabble.com> References: <1304505844374-3495104.post@n4.nabble.com> Message-ID: On May 4, 2011, at 3:44 AM, TheSavageSam wrote: > Hello! > > I have had some problems lately with use of R at home and school. At > my home > laptop (Ubuntu linux 64bit & R-2.12.0) R works just fine. But when I > take my > codes to school(Windows XP 32bit & R-2.10.1 I think) and run those > there > those codes probably won't work. Functions as combinations() > ?combinations No documentation for 'combinations' in specified packages and libraries: you could try '??combinations' So I (like you) don't have the package with combinations loaded. > didn't work and > expand.grid() worked a bit differently. About expand.grid I just > couldn't > pass a dataframe of parametres but I had to pass all the colums > separately. That is most probably a version difference. The Core group works very diligently to keep the versions returning the same results (when possible) on all platforms. Graphics and operating system-specific tasks (including clipboard access) are obvious exceptions. > > Is this because my university has so old version of R? Or is it > because I am > using Linux at home? Or is it because some libraries aren't installed? > I like R very much, but if there is difference between R in different > operating systems, then I dislike that. > > Can you give me some tips how to avoid these problems? Install > latest R to > my university PC? I don't want to fall back at Windows users -category > anymore. > > -- > TheSavageSam > > -- David Winsemius, MD Heritage Laboratories West Hartford, CT From ripley at stats.ox.ac.uk Wed May 4 17:07:00 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 4 May 2011 16:07:00 +0100 (BST) Subject: [R] create a folder with mode '0777' In-Reply-To: References: Message-ID: Please read the comments in the help about umask (and in the posting guide about the 'at a minimum' information required in postings, for the details do depend on the version of R and it seems yours is not current). In R 2.13.0: ?dir.create? creates the last element of the path, unless ?recursive = TRUE?. Trailing path separators are discarded. The mode will be modified by the ?umask? setting in the same way as for the system function ?mkdir?. so try um <- Sys.umask(0) dir.create('test/sub', recursive=TRUE) Sys.umask(um) (Whether mkdir -p respects umask depends on your OS ... and the command-line command and the system call of that name may differ.) On Wed, 4 May 2011, Xian Zhang wrote: > Dear list, > > I am trying to create a folder structure, say 'test/sub', and set the > folder and sub folder to be writable to everyone. > > By default > > dir.create('test/sub', recursive=TRUE, mode='0777') > > creates folders with mode: drwxr-xr-x > > After > > Sys.chmod('test/sub',mode='0777') > > The folder 'test' is: drwxr-xr-x > and the sub folder 'sub' is: drwxrwxrwx > > The question is how to generate a folder and sub folders, with every > folder being drwxrwxrwx ? > > I am using a linux/redhat system. > > Thank you for your help. > Xian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From datkins at u.washington.edu Wed May 4 17:13:30 2011 From: datkins at u.washington.edu (David Atkins) Date: Wed, 04 May 2011 08:13:30 -0700 Subject: [R] hurdle, simulated power Message-ID: <4DC16D1A.4000904@u.washington.edu> Hi all-- We are planning an intervention study for adolescent alcohol use, and I am planning to use simulations based on a hurdle model (using the hurdle() function in package pscl) for sample size estimation. The simulation code and power code are below -- note that at the moment the "power" code is just returning the coefficients, as something isn't working quite right. The average estimates from code below are: count_(Intercept) count_trt zero_(Intercept) 2.498327128 -0.000321315 0.910293501 zero_trt -0.200134813 Three of the four look right (ie, converging to population values), but the count_trt is stuck at zero, regardless of sample size (when it should be ~ -0.20). Does anyone see what's wrong? Thanks for any input. cheers, Dave mysim <- function(n, beta0, beta1, alpha0, alpha1, theta){ trt <- c(rep(0,n), rep(1,n)) ### mean function logit model p0 <- exp(alpha0 + alpha1*trt)/(1 + exp(alpha0 + alpha1*trt)) ### 0 / 1 based on p0 y1 <- as.numeric(runif(n)>p0) ### mean function count portion mu <- exp(beta0 + beta1*trt) ### estimate counts using NB dist require(MASS, quietly = TRUE) y2 <- rnegbin(n, mu = mu, theta = theta) ### if y2 = 0, draw new value while(sum(y2==0)>0){ y2[which(y2==0)] <- rnegbin(length(which(y2==0)), mu=mu[which(y2==0)], theta = theta) } y<-y1*y2 data.frame(trt=trt,y=y) } #alpha0, alpha1 is the parameter for zero part #beta0,beta1 is the parameter for negative binomial #theta is dispersion parameter for negative binomial, infinity correspond to poisson # #example power analysis #return three power, power1 for zero part, power2 for negative binomial part #power3 for joint test,significance level can be set, default is 0.05 #M is simulation time #require pscl package #library(pscl) mypower <- function(n, beta0, beta1, alpha0, alpha1, theta, siglevel=0.05, M=1000){ myfun <- function(n,beta0,beta1,alpha0,alpha1,theta,siglevel){ data <- mysim(n,beta0,beta1,alpha0,alpha1,theta) require(pscl, quietly = TRUE) res <- hurdle(y ~ trt, data = data, dist = "negbin", trace = FALSE) est <- coef(res)#[c(2,4)] #v<-res$vcov[c(2,4),c(2,4)] #power1<-as.numeric(2*pnorm(-abs(est)[2]/sqrt(v[2,2])) Dear R-helpers, I need to fit natural cubic spline with specified number of knots. I expected 'splines' package will be helpful, but I am confused by its help. Is more detailed documentation available for it or could you recommend another R function? Best regards Ondrej Mikula From jwiley.psych at gmail.com Wed May 4 17:34:03 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 4 May 2011 08:34:03 -0700 Subject: [R] natural cubic splines In-Reply-To: References: Message-ID: Hi Ondrej, What documentation have you looked at? Does this help at all? require(splines) ?ns ## one example summary(lm(y ~ ns(x, df = 3), data = data.frame(y = runif(100), x = rbinom(100, 9, .25)^2))) ## built in examples example(ns) Also, I am very fond of the book, Modern Applied Statistics with S by Venables & Ripley. It has a section on splines that might help you. Cheers, Josh 2011/5/4 Ond?ej Mikula : > Dear R-helpers, > I need to fit natural cubic spline with specified number of knots. I > expected 'splines' package will be helpful, but I am confused by its > help. Is more detailed documentation available for it or could you > recommend another R function? > Best regards > Ondrej Mikula > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From rmh at temple.edu Wed May 4 17:43:01 2011 From: rmh at temple.edu (Richard M. Heiberger) Date: Wed, 4 May 2011 11:43:01 -0400 Subject: [R] Superscript number before letter In-Reply-To: <036FAF88-AB8D-407C-9CA4-D5FF8F399308@comcast.net> References: <1304516263466-3495577.post@n4.nabble.com> <036FAF88-AB8D-407C-9CA4-D5FF8F399308@comcast.net> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From xavier_abulker at yahoo.fr Wed May 4 17:49:04 2011 From: xavier_abulker at yahoo.fr (xavier abulker) Date: Wed, 4 May 2011 16:49:04 +0100 (BST) Subject: [R] Error Rscript: No such file or directory Message-ID: <199312.35402.qm@web26508.mail.ukl.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From csilva at ipimar.pt Wed May 4 17:50:53 2011 From: csilva at ipimar.pt (Cristina Silva) Date: Wed, 04 May 2011 16:50:53 +0100 Subject: [R] Panels order in lattice graphs Message-ID: <4DC175DD.2090806@ipimar.pt> Hi all, In lattice graphs, panels are drawn from left to right and bottom to top. The flag "as.table=TRUE" changes to left to right and top to bottom. Is there any way to change to first top to bottom and then left to right? didn?t find anything neither in Help pages nor Lattice book. Cristina -- ------------------------------------------ Cristina Silva INRB/L-IPIMAR Unidade de Recursos Marinhos e Sustentabilidade Av. de Bras?lia, 1449-006 Lisboa Portugal Tel.: 351 21 3027096 Fax: 351 21 3015948 csilva at ipimar.pt From wdunlap at tibco.com Wed May 4 17:52:07 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 4 May 2011 08:52:07 -0700 Subject: [R] Simple loop In-Reply-To: <20110504075059.GA1116@praha1.ff.cuni.cz> References: <1304437471793-3492819.post@n4.nabble.com><77EB52C6DD32BA4D87471DCD70C8D700042B94AC@NA-PA-VBE03.na.tibco.com> <20110504075059.GA1116@praha1.ff.cuni.cz> Message-ID: <77EB52C6DD32BA4D87471DCD70C8D700042B9652@NA-PA-VBE03.na.tibco.com> > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Petr Savicky > Sent: Wednesday, May 04, 2011 12:51 AM > To: r-help at r-project.org > Subject: Re: [R] Simple loop > > On Tue, May 03, 2011 at 12:04:47PM -0700, William Dunlap wrote: > [...] > > ave() can deal that problem: > > > cbind(x, newCol2 = with(x, ave(H, Site, Prof, > > FUN=function(y)y-min(y)))) > > Site Prof H newCol2 > > 1 1 1 24 8 > > 2 1 1 16 0 > > 3 1 1 67 51 > > 4 1 2 23 0 > > 5 1 2 56 33 > > 6 1 2 45 22 > > 7 2 1 67 21 > > 8 2 1 46 0 > > Warning message: > > In min(y) : no non-missing arguments to min; returning Inf > > The warning is unfortunate: ave() calls FUN even for when > > there is no data for a particular group (Site=2, Prof=2 in this > > case). > > The warning may be avoided using min(y, Inf) instead of min(). Yes, but the fact remains that ave() wastes time and causes unnecessary warnings and errors by calling FUN when it knows it will do nothing with the result (because there are no entries in x with a given combination of the factor levels in the ... arguments). Using paste(Site,Prof) when calling ave() is ugly, in that it forces you to consider implementation details that you expect ave() to take care of (how does paste convert various types to strings?). It also courts errors since paste("A B", "C") and paste("A", "B C") give the same result but represent different Site/Prof combinations. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > cbind(x, newCol2 = with(x, ave(H, Site, Prof, > FUN=function(y)y-min(y,Inf)))) > > Site Prof H newCol2 > 1 1 1 24 8 > 2 1 1 16 0 > 3 1 1 67 51 > 4 1 2 23 0 > 5 1 2 56 33 > 6 1 2 45 22 > 7 2 1 67 21 > 8 2 1 46 0 > > Another approach is to combine Site, Prof to a single column > in any way suitable for the application. For example > > cbind(x, newCol2 = with(x, ave(H, paste(Site, Prof), > FUN=function(y)y-min(y)))) > > Petr Savicky. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jtor14 at gmail.com Wed May 4 17:57:53 2011 From: jtor14 at gmail.com (Justin Haynes) Date: Wed, 4 May 2011 08:57:53 -0700 Subject: [R] xtable without a loop alongside a ggplot Message-ID: I would like to create a table of my points and identify which 'quadrant' of a plot they are in with the 'origin' at the means. the kicker is i would like to display it right next to or below a ggplot of the data. Maybe xtable isnt the right thing to use, but its the only thing i can think of. Any help is appreciated! set.seed(144) x=rnorm(100,mean=5,sd=1) test<-data.frame(x=x,y=x^2) test$right<-sapply(test$x,function(x) {mean.x<-mean(test$x);any(x>mean.x)}) test$up<-sapply(test$y,function(y) {mean.y<-mean(test$y);any(y>mean.y)}) for(i in 1:length(test$x)){ if(test$right[i]==TRUE & test$up[i]==TRUE) print(paste(rownames(test[i,]),'is in the upper right quadrant')) if(test$right[i]==FALSE & test$up[i]==TRUE) print(paste(rownames(test[i,]),'is in the upper left quadrant')) if(test$right[i]==TRUE & test$up[i]==FALSE) print(paste(rownames(test[i,]),'is in the lower right quadrant')) if(test$right[i]==FALSE & test$up[i]==FALSE) print(paste(rownames(test[i,]),'is in the lower left quadrant')) } I know theres a better way then using a for loop! and I haven't the foggiest how to use xtable. as i said, the ultimate goal is to create a plot with a table along side it showing outliers and where they appear using the inout function from the splancs package and a confidence ellipse from the ellipse package. Thank you for your help as usual! Justin From murdoch.duncan at gmail.com Wed May 4 18:37:55 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 04 May 2011 12:37:55 -0400 Subject: [R] Error Rscript: No such file or directory In-Reply-To: <199312.35402.qm@web26508.mail.ukl.yahoo.com> References: <199312.35402.qm@web26508.mail.ukl.yahoo.com> Message-ID: <4DC180E3.5000804@gmail.com> On 04/05/2011 11:49 AM, xavier abulker wrote: > Hello, > I'm trying to build a simple cpp file using the R CMD SHLIB command and I always > receive the same error message: > > cygwin warning: > MS-DOS style path detected: C:/PROGRA~1/R/R-212~1.1/etc/i386/Makeconf > Preferred POSIX equivalent is: > /cygdrive/c/PROGRA~1/R/R-212~1.1/etc/i386/Makeconf > CYGWIN environment variable option "nodosfilewarning" turns off this warning. > Consult the user's guide for more details about POSIX paths: > http://cygwin.com/cygwin-ug-net/using.html#using-pathnames > gcc -shared -s -static-libgcc -o hello.dll tmp.def hello.o Rscript -e > Rcpp:::LdFlags() -LC:/PROGRA~1 It looks as though you have somehow got some strange environment variables or contents of a Makevars file set, possibly because you have install Rcpp. The "Rscript -e Rcpp:::LdFlags()" should not be there. Duncan Murdoch > /R/R-212~1.1/bin/i386 -lR > gcc.exe: Rscript: No such file or directory It looks as though the file > Here is what I do: > > * From the cpp file below: > > //file hello.cpp > #include > void sayhello() { > printf("Hello world\n"); > } > > * I build the file with the DOS command: > > R CMD SHLIB hello.c > > > My confirguration is > R 2.12.2 > Windows XP 2002 SP3 > I have installed the Rtools components and have included the following links > into my path: > > C:\Rtools\bin; > C:\Rtools\perl\bin; > C:\Rtools\MinGW\bin; > C:\Program Files\R\R-2.12.1\bin > > Thanks a lot for you help > Xavier > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From xavier_abulker at yahoo.fr Wed May 4 18:50:43 2011 From: xavier_abulker at yahoo.fr (xavier abulker) Date: Wed, 4 May 2011 17:50:43 +0100 (BST) Subject: [R] Error Rscript: No such file or directory In-Reply-To: <4DC180E3.5000804@gmail.com> References: <199312.35402.qm@web26508.mail.ukl.yahoo.com> <4DC180E3.5000804@gmail.com> Message-ID: <312680.49877.qm@web26503.mail.ukl.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mi2kelgrum at yahoo.com Wed May 4 19:14:25 2011 From: mi2kelgrum at yahoo.com (Mikkel Grum) Date: Wed, 4 May 2011 10:14:25 -0700 (PDT) Subject: [R] tryCatch? Message-ID: <454478.53335.qm@web65708.mail.ac4.yahoo.com> I would like to do inserts into a database table, but do updates in the fairly rare cases in which the inserts fail. I thought tryCatch might be the way to do it, but I honestly do not understand the help file for tryCatch at all. I thought something like this might work: for (i in seq(along = tbl$key)) { tryCatch(sqlSave(pg, tbl[i, ], "tbl", append = TRUE, rownames = FALSE), if (fails) do {sqlUpdate(pg, tbl[i, ], index = "key")}) } This obviously isn't the correct syntax, but could tryCatch do this if I got the syntax right? And any tips on what the right syntax would be? Mikkel From savicky at praha1.ff.cuni.cz Wed May 4 19:21:11 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Wed, 4 May 2011 19:21:11 +0200 Subject: [R] Simple loop In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D700042B9652@NA-PA-VBE03.na.tibco.com> References: <20110504075059.GA1116@praha1.ff.cuni.cz> <77EB52C6DD32BA4D87471DCD70C8D700042B9652@NA-PA-VBE03.na.tibco.com> Message-ID: <20110504172111.GA23007@praha1.ff.cuni.cz> On Wed, May 04, 2011 at 08:52:07AM -0700, William Dunlap wrote: > > -----Original Message----- > > From: r-help-bounces at r-project.org > > [mailto:r-help-bounces at r-project.org] On Behalf Of Petr Savicky > > Sent: Wednesday, May 04, 2011 12:51 AM > > To: r-help at r-project.org > > Subject: Re: [R] Simple loop > > > > On Tue, May 03, 2011 at 12:04:47PM -0700, William Dunlap wrote: > > [...] > > > ave() can deal that problem: > > > > cbind(x, newCol2 = with(x, ave(H, Site, Prof, > > > FUN=function(y)y-min(y)))) > > > Site Prof H newCol2 > > > 1 1 1 24 8 > > > 2 1 1 16 0 > > > 3 1 1 67 51 > > > 4 1 2 23 0 > > > 5 1 2 56 33 > > > 6 1 2 45 22 > > > 7 2 1 67 21 > > > 8 2 1 46 0 > > > Warning message: > > > In min(y) : no non-missing arguments to min; returning Inf > > > The warning is unfortunate: ave() calls FUN even for when > > > there is no data for a particular group (Site=2, Prof=2 in this > > > case). > > > > The warning may be avoided using min(y, Inf) instead of min(). > > Yes, but the fact remains that ave() wastes time and causes > unnecessary warnings and errors by calling FUN when it knows > it will do nothing with the result (because there are no entries > in x with a given combination of the factor levels in the ... > arguments). I agree. For the original question, avoiding the warning is preferrable. The general question belongs more to R-devel. > Using paste(Site,Prof) when calling ave() is ugly, in that it > forces you to consider implementation details that you expect > ave() to take care of (how does paste convert various types > to strings?). It also courts errors since paste("A B", "C") > and paste("A", "B C") give the same result but represent different > Site/Prof combinations. Thank you for this remark. I used the formulation "combine ... in any way suitable for the application" with this effect in mind, but let us be more specific. For numbers, in particular integers, paste() seems to be good enough. For character vectors, a possible approach is paste(X, Y, sep="\r") since the character "\r" is unlikely to be used in character vectors. A similar approach is used, for example in unique.matrix(). I did not like it much, but it also has advantages. Petr Savicky. From ripley at stats.ox.ac.uk Wed May 4 19:21:45 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 4 May 2011 18:21:45 +0100 (BST) Subject: [R] tryCatch? In-Reply-To: <454478.53335.qm@web65708.mail.ac4.yahoo.com> References: <454478.53335.qm@web65708.mail.ac4.yahoo.com> Message-ID: Start with try(): you may find it easier to understand. if(inherits(try(), "try-error")) so in your case if(inherits(try(sqlSave(pg, tbl[i, ], "tbl", append = TRUE, rownames = FALSE)))) sqlUpdate(pg, tbl[i, ], index = "key") or some such. On Wed, 4 May 2011, Mikkel Grum wrote: > I would like to do inserts into a database table, but do updates in the fairly rare cases in which the inserts fail. I thought tryCatch might be the way to do it, but I honestly do not understand the help file for tryCatch at all. > > I thought something like this might work: > for (i in seq(along = tbl$key)) { > tryCatch(sqlSave(pg, tbl[i, ], "tbl", append = TRUE, rownames = FALSE), > if (fails) do {sqlUpdate(pg, tbl[i, ], index = "key")}) > } > > This obviously isn't the correct syntax, but could tryCatch do this if I got the syntax right? And any tips on what the right syntax would be? > > Mikkel > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From mhramire at uc.cl Wed May 4 19:22:54 2011 From: mhramire at uc.cl (=?ISO-8859-1?Q?Mat=EDas_Ram=EDrez_Salgado?=) Date: Wed, 4 May 2011 14:22:54 -0300 Subject: [R] problem with package "adapt" for R in Mac In-Reply-To: <4DC11D2C.3020901@statistik.tu-dortmund.de> References: <20110504071919.GV48756@ms.unimelb.edu.au> <4DC11D2C.3020901@statistik.tu-dortmund.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Wed May 4 19:26:49 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Wed, 04 May 2011 19:26:49 +0200 Subject: [R] problem with package "adapt" for R in Mac In-Reply-To: References: <20110504071919.GV48756@ms.unimelb.edu.au> <4DC11D2C.3020901@statistik.tu-dortmund.de> Message-ID: <4DC18C59.4060706@statistik.tu-dortmund.de> On 04.05.2011 19:22, Mat?as Ram?rez Salgado wrote: > The package is called: "adapt" (adapt_1.0-4.tgz is the version of > package for mac, search it in google) > is not in all CRAN servers, because is an old package. Yes, there is a reason why the package was archived (which I said already!): It does not pass the checks for new versions of R. Either get a source version from the archives of CRAN and try to install it from sources (which may require fixes) or use another package that provides similar features such as - R2Cuba - cubature Uwe Ligges > I tried to install it from package manager as CRAN (binaries), Other > repositorys, and as a local source packages, and allways returns this > messaje: > > [Workspace restored from /Users/matiashernanramirezsalgado/.RData] > Durante la inicializaci'on - Mensajes de aviso perdidos > 1: Setting LC_CTYPE failed, using "C" > 2: Setting LC_TIME failed, using "C" > 3: Setting LC_MESSAGES failed, using "C" > * installing *binary* package 'adapt' ... > > * DONE (adapt) > > > > library(adapt) > Error: package 'adapt' was built before R 2.10.0: please re-install it > > > > > 2011/5/4 Uwe Ligges > > > > > On 04.05.2011 09:19, Andrew Robinson wrote: > > Hi, > > Is there such a package? > > > There was such a package that is archived now. > > > I can't find it on CRAN. Can you let us > know exactly how you tried to install it, and what the error message > was (if any)? > > > The OP is probably looking for packages such as > - R2Cuba > - cubature > > Uwe Ligges > > > > > > Cheers > > Andrew > > > On Wed, May 04, 2011 at 01:29:37AM -0300, Mat?as Ram?rez Salgado > wrote: > > Hi, > > How i can install the package "adapt" in some version of R > for mac? > > i try in 2.13, 2.9,2.7 and other previous versions... and > nothing happens. > > and another question: There are some packages that do the > same but that it > is implemented for mac? (calculate integrals in 2 or more > dimmensions). > > help me please, it's for an important work. > > greetings. > > > -- > Mat?as Hern?n Ram?rez Salgado. > Estudiante de Estad?stica. > Pontificia Universidad Cat?lica de Chile. > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible > code. > > > > > > > -- > Mat?as Hern?n Ram?rez Salgado. > Estudiante de Estad?stica. > Pontificia Universidad Cat?lica de Chile. > From dwinsemius at comcast.net Wed May 4 19:30:00 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 10:30:00 -0700 Subject: [R] Superscript number before letter In-Reply-To: References: <1304516263466-3495577.post@n4.nabble.com> <036FAF88-AB8D-407C-9CA4-D5FF8F399308@comcast.net> Message-ID: <7A86A972-F415-469F-A314-0889354BB188@comcast.net> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Wed May 4 19:34:22 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 4 May 2011 10:34:22 -0700 Subject: [R] natural cubic splines In-Reply-To: References: Message-ID: <85F0C74E-356B-4C90-80AC-AACA3310FD87@comcast.net> On May 4, 2011, at 8:34 AM, Joshua Wiley wrote: > Hi Ondrej, > > What documentation have you looked at? Does this help at all? > > require(splines) > ?ns > ## one example > summary(lm(y ~ ns(x, df = 3), > data = data.frame(y = runif(100), x = rbinom(100, 9, .25)^2))) > > ## built in examples > example(ns) > > Also, I am very fond of the book, Modern Applied Statistics with S by > Venables & Ripley. It has a section on splines that might help you. > Agree on that last point entirely. I understand that restricted cubic splines are another name for natural splines (but would welcome correction if that understanding is in error). They are used extensively in the rms/Hmisc package combination with the supporting text "Regression Modeling Strategies". -- David > Cheers, > > Josh > > > 2011/5/4 Ond?ej Mikula : >> Dear R-helpers, >> I need to fit natural cubic spline with specified number of knots. I >> expected 'splines' package will be helpful, but I am confused by its >> help. Is more detailed documentation available for it or could you >> recommend another R function? >> Best regards >> Ondrej Mikuladucible code. >> > David Winsemius, MD Heritage Laboratories West Hartford, CT From mi2kelgrum at yahoo.com Wed May 4 19:59:41 2011 From: mi2kelgrum at yahoo.com (Mikkel Grum) Date: Wed, 4 May 2011 10:59:41 -0700 (PDT) Subject: [R] tryCatch? In-Reply-To: Message-ID: <152619.74752.qm@web65712.mail.ac4.yahoo.com> Beautiful Prof. This worked: for (i in seq(along = tbl$key)) { if (inherits(try(sqlSave(pg, tbl[i, ], "tbl", append = TRUE, rownames = FALSE), silent = TRUE), "try-error", TRUE)) sqlUpdate(pg, tbl[i, ], index = "key1") ) } --- On Wed, 5/4/11, Prof Brian Ripley wrote: > From: Prof Brian Ripley > Subject: Re: [R] tryCatch? > To: "Mikkel Grum" > Cc: "R Help" > Date: Wednesday, May 4, 2011, 12:21 PM > Start with try(): you may find it > easier to understand. > > if(inherits(try(), "try-error")) > > > so in your case > > if(inherits(try(sqlSave(pg, tbl[i, ], "tbl", append = > TRUE, > ? ? ? ? ? ? ? ? > ? ? ? ???rownames = FALSE)))) > ? ???sqlUpdate(pg, tbl[i, ], index = > "key") > > or some such. > > On Wed, 4 May 2011, Mikkel Grum wrote: > > > I would like to do inserts into a database table, but > do updates in the fairly rare cases in which the inserts > fail. I thought tryCatch might be the way to do it, but I > honestly do not understand the help file for tryCatch at > all. > > > > I thought something like this might work: > > for (i in seq(along = tbl$key)) { > >? ? tryCatch(sqlSave(pg, tbl[i, ], "tbl", > append = TRUE, rownames = FALSE), > >? ? ? ? if (fails) do > {sqlUpdate(pg, tbl[i, ], index = "key")}) > > } > > > > This obviously isn't the correct syntax, but could > tryCatch do this if I got the syntax right? And any tips on > what the right syntax would be? > > > > Mikkel > > > > ______________________________________________ > > R-help at r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > -- > Brian D. Ripley,? ? ? ? ? ? > ? ? ? ripley at stats.ox.ac.uk > Professor of Applied Statistics,? http://www.stats.ox.ac.uk/~ripley/ > University of Oxford,? ? ? ? ? > ???Tel:? +44 1865 272861 (self) > 1 South Parks Road,? ? ? ? ? > ? ? ? ? ???+44 1865 > 272866 (PA) > Oxford OX1 3TG, UK? ? ? ? ? ? > ? ? Fax:? +44 1865 272595 > From lucia.canas at co.ieo.es Wed May 4 20:26:06 2011 From: lucia.canas at co.ieo.es (=?iso-8859-1?Q?Lucia_Ca=F1as?=) Date: Wed, 4 May 2011 20:26:06 +0200 Subject: [R] combine lattice plot and standard R plot Message-ID: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From biomathjdaily at gmail.com Wed May 4 20:44:41 2011 From: biomathjdaily at gmail.com (Jonathan Daily) Date: Wed, 4 May 2011 14:44:41 -0400 Subject: [R] combine lattice plot and standard R plot In-Reply-To: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> References: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> Message-ID: If you read the help documentation, lattice is not really compatible with standard graphics. library("lattice") ?lattice 2011/5/4 Lucia Ca?as : > Dear R users, > > I would like to combine lattice plot (xyplot) and standard R plot (plot and plotCI) in an unique figure. > > I use the function "par()" to combine plot and plotCI and I use the function "print()" to combine xyplot. I tried to use these functions to combine xyplot and plotCI and plots but they do not work. Does anybody know how I can do this? > > Thank you very much in advance. > > > > > Luc?a Ca??s Ferreiro > > Instituto Espa?ol de Oceanograf?a > Centro Oceanogr?fico de A coru?a > Paseo Mar?timo Alcalde Francisco V?zquez, 10 > 15001 - A Coru?a, Spain > > Tel: +34 981 218151 ?Fax: +34 981 229077 > lucia.canas at co.ieo.es > http://www.ieo.es > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- =============================================== Jon Daily Technician =============================================== #!/usr/bin/env outside # It's great, trust me. From scttchamberlain4 at gmail.com Wed May 4 20:47:20 2011 From: scttchamberlain4 at gmail.com (Scott Chamberlain) Date: Wed, 4 May 2011 13:47:20 -0500 Subject: [R] combine lattice plot and standard R plot In-Reply-To: References: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From johannesraja at gmail.com Wed May 4 21:41:36 2011 From: johannesraja at gmail.com (johannes rara) Date: Wed, 4 May 2011 22:41:36 +0300 Subject: [R] Regexp question Message-ID: I have a string like this st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary), FROM Employees" How can I remove the last comma before the FROM statement? -J From x-jess-h-x at hotmail.co.uk Wed May 4 16:19:39 2011 From: x-jess-h-x at hotmail.co.uk (blutack) Date: Wed, 4 May 2011 07:19:39 -0700 (PDT) Subject: [R] Saving Values in a Vector from a For Loop Message-ID: <1304518779740-3495714.post@n4.nabble.com> Hi, I have a created a function, but now I need to call it about a hundred times and store the results as a vector. I think doing a for loop would work, but I cant work out how to save the values generated from the function as a vector. Any ideas? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Saving-Values-in-a-Vector-from-a-For-Loop-tp3495714p3495714.html Sent from the R help mailing list archive at Nabble.com. From behrendt.l at gmail.com Wed May 4 16:27:48 2011 From: behrendt.l at gmail.com (Muhidini) Date: Wed, 4 May 2011 07:27:48 -0700 (PDT) Subject: [R] Distance measure in heatmaps Message-ID: <1304519268530-3495734.post@n4.nabble.com> Hi, First- I am relatively new to R and would therefore greatly appreciate any kind of help from you ! I am currently trying to use R to make a heatmap using the Pheatmap package. My question: I want to include a different distance measure than the ones given by the implemented hclust function in the pheatmap package (such as euclidean etc.). I have calculated a distance matrix somewhere else (unifrac for those who are interested) and want to cluster my columns according to this matrix. Is there a way to do this ? Thanks for any kind of input Muhidini -- View this message in context: http://r.789695.n4.nabble.com/Distance-measure-in-heatmaps-tp3495734p3495734.html Sent from the R help mailing list archive at Nabble.com. From x-jess-h-x at hotmail.co.uk Wed May 4 16:30:20 2011 From: x-jess-h-x at hotmail.co.uk (blutack) Date: Wed, 4 May 2011 07:30:20 -0700 (PDT) Subject: [R] Uniform Gaussian Kernel Message-ID: <1304519420181-3495742.post@n4.nabble.com> I have a vector with lots of different numbers. I need to make a graph showing the Uniform Distribution of the figures. I have created a graph showing all the different values, but now want individual Gaussian Kernel round each point. This is what I have but each time it comes up with an error as I have just based it on the Normal Distribution, but I'm not sure what I need to change to make it work. Where z is my vector. plot(0, 0, xlim=range(0, 300), ylim=range(0, 1), pch=NA,) for(i in 1:length(z)) { points(z[i], 0, pch="|") } x = seq(-10, 10, 0.01) for(i in 1:length(z)){ std_dev = 1 lines(x, dunif(x, z[i], sd = std_dev)) } Any ideas? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/Uniform-Gaussian-Kernel-tp3495742p3495742.html Sent from the R help mailing list archive at Nabble.com. From Alfredo.Roccato at unicredit.eu Wed May 4 16:32:14 2011 From: Alfredo.Roccato at unicredit.eu (Roccato Alfredo (UniCredit)) Date: Wed, 4 May 2011 16:32:14 +0200 Subject: [R] join tables in R Message-ID: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B301@USEXCPWM07.mc01.unicreditgroup.eu> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From janine.halder at unil.ch Wed May 4 16:52:48 2011 From: janine.halder at unil.ch (Janhal) Date: Wed, 4 May 2011 07:52:48 -0700 (PDT) Subject: [R] Superscript number before letter In-Reply-To: References: <1304516263466-3495577.post@n4.nabble.com> Message-ID: <1304520768833-3495812.post@n4.nabble.com> Yes thats it :-) Thank you very much! Janine -- View this message in context: http://r.789695.n4.nabble.com/Superscript-number-before-letter-tp3495577p3495812.html Sent from the R help mailing list archive at Nabble.com. From marchywka at hotmail.com Wed May 4 17:21:33 2011 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 4 May 2011 11:21:33 -0400 Subject: [R] nls problem with R In-Reply-To: <1304518064519-3495672.post@n4.nabble.com> References: <1304482083098-3494454.post@n4.nabble.com>, <20110504071506.GU48756@ms.unimelb.edu.au>, <1304518064519-3495672.post@n4.nabble.com> Message-ID: > Date: Wed, 4 May 2011 07:07:44 -0700 > From: sterlesser at hotmail.com > To: r-help at r-project.org > Subject: Re: [R] nls problem with R > > Thanks Andrew. > I am sorry for some typos that I omit some numbers of T2. > Based on your suggestion,I think the problem is in the initial values. > And I will read more theory about the non-linear regression. there is unlikely to be any magic involved, unlike getting hotmail to work. As a tool for understanding your data, you should have some idea of the qualitiative properties of model and data and the error function you use to reconcile the two. If you can post your full data set I may post an R example of somethings to try. I was looking for an excuse to play with nls, I'm not expert here, and curious to see what I can do with your example for critique by others. If you want to fully automate this for N contnuous parameters, you can take a shotgun approach but not sure it helps other htna to find gross problems in model or data. I actually wrote a loop to keep picking random parameter values and calculate and SSE between predicted and real data. What you soon find is that this is like trying to decode a good crypto algorithm by guessing- you can do the math to see the problem LOL. > > -- > View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3495672.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Marie-Line.Glaesener at uni.lu Wed May 4 17:30:05 2011 From: Marie-Line.Glaesener at uni.lu (Marie-Line Glaesener) Date: Wed, 4 May 2011 15:30:05 +0000 Subject: [R] Instrumental variable quantile estimation of spatial autoregressive models Message-ID: <99D322B6F7951F468D4788B2CB077C7B0BD10162@trip.uni.lux> Dear all, I would like to implement a spatial quantile regression using instrumental variable estimation (according to Su and Yang (2007), Instrumental variable quantile estimation of spatial autoregressive models, SMU economics & statistis working paper series, 2007, 05-2007, p.35 ). I am applying the hedonic pricing method on land transactions in Luxembourg. My original data set contains 4335 observations. I'm quite new to R and would like to ask if someone has implemented the method proposed by Su and Yang in R or if anyone could give me a hint on the different codes and steps? Please find attached a small sample of my data and matrix. R codes: library(foreign) library(lmtest) library(spdep) library(quantreg) data<-read.table("DataSample.txt",header=TRUE, sep="") attach(data) matrix<-read.gwt2nb("matrixsample.gwt" ,region.id=no_Trans) matrix.listw<-nb2listw(matrix) OLS model OLS<-lm(lnprice~surface+d2007+LUX+tsect_ci, data=data) summary(OLS) SAR model SAR<-lagsarlm(lnprice~surface+d2007+LUX+tsect_ci, data=data, listw = matrix.listw) summary(SAR) I hope that this information is sufficient and will help you to help me :) Many thanks in advance, Marie-Line Glaesener PhD student Unit? de Recherche IPSE (Identit?s. Politiques, Soci?t?s, Espaces) Laboratoire de G?ographie et Am?nagement du Territoire UNIVERSIT? DU LUXEMBOURG CAMPUS WALFERDANGE Route de Diekirch / BP 2 L-7201 Walferdange Luxembourg www.geo.ipse.uni.lu -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: DataSample.txt URL: From sarahboule at hotmail.com Wed May 4 17:55:56 2011 From: sarahboule at hotmail.com (boule) Date: Wed, 4 May 2011 08:55:56 -0700 (PDT) Subject: [R] Outlier removal by Principal Component Analysis : error message Message-ID: <1304524556976-3496023.post@n4.nabble.com> Hi, I am currently analysis Raman spectroscopic data with the hyperSpec package. I consulted the documentation on this package and I found an example work-flow dedicated to Raman spectroscopy (see the address : http://hyperspec.r-forge.r-project.org/chondro.pdf) I am currently trying to remove outliers thanks to PCA just as they did in the documentation, but I get a message error I can't explain. Here is my code : "#import the data : T=read.table('bladder bis concatenation colonne.txt',header=TRUE) spec=new("hyperSpec",wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength="Raman shift (cm-1)",spc="Intensity (a.u.)")) #baseline correction of the spectra spec=spec[,,500~1800] bl=spc.fit.poly.below(spec) spec=spec-bl #normalization of the spectra spec=sweep(spec,1,apply(spec,1,mean),'/') #PCA pca=prcomp(~ spc,data=spec$.,center=TRUE) scores=decomposition(spec,pca$x,label.wavelength="PC",label.spc="score/a.u.") loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc="laoding I/a.u.") #plot the scores of the first 20 PC against all other to have an idea where to find the outliers pairs(scores[[,,1:20]],pch=19,cex=0.5) #identify the outliers thanks to "map.identify" out=map.identify(scores[,,5]) Erreur dans `[.data.frame`(x at data, , j, drop = FALSE) : undefined columns selected Does anybody understand where the problem comes from ? And does anybody know another mean to find spectra outliers ? Thank you in advance. Boule -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html Sent from the R help mailing list archive at Nabble.com. From pangdu at vt.edu Wed May 4 18:24:50 2011 From: pangdu at vt.edu (Pang Du) Date: Wed, 4 May 2011 12:24:50 -0400 Subject: [R] two-way group mean prediction in survreg with three factors Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sbpurohit at gmail.com Wed May 4 19:53:55 2011 From: sbpurohit at gmail.com (1Rnwb) Date: Wed, 4 May 2011 10:53:55 -0700 (PDT) Subject: [R] merging multiple columns from two dataframes Message-ID: <1304531635175-3496341.post@n4.nabble.com> Hello, I have data in a dataframe with 139104 rows which is multiple of 96x1449. i have a phenotype file which contains the phenotype information for the 96 samples. the snp name is repeated 1449X96 samples. I haveto merge the two dataframes based on sid and sen. this is how my two dataframes look like dat<-data.frame(snpname=rep(letters[1:12],12),sid=rep(1:12,each=12), genotype=rep(c('aa','ab','bb'), 12)) pheno<-data.frame(sen=1:12,disease=rep(c('N','Y'),6), wellid=1:12) I have to merge or add the disease column and 3 other columns to the data file. I am unable to use merge in R. I have searched google, i guess i am not hitting the correct terms to get the answer. I would appreciate any input on this issue. thanks sharad -- View this message in context: http://r.789695.n4.nabble.com/merging-multiple-columns-from-two-dataframes-tp3496341p3496341.html Sent from the R help mailing list archive at Nabble.com. From kagba2006 at yahoo.com Wed May 4 18:47:09 2011 From: kagba2006 at yahoo.com (FMH) Date: Wed, 4 May 2011 09:47:09 -0700 (PDT) Subject: [R] best subset regression in R Message-ID: <792436.61050.qm@web38308.mail.mud.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From kagba2006 at yahoo.com Wed May 4 18:53:54 2011 From: kagba2006 at yahoo.com (FMH) Date: Wed, 4 May 2011 09:53:54 -0700 (PDT) Subject: [R] Box-Cox transformation in R Message-ID: <435617.42947.qm@web38305.mail.mud.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dan.abner99 at gmail.com Wed May 4 19:42:56 2011 From: dan.abner99 at gmail.com (Dan Abner) Date: Wed, 4 May 2011 13:42:56 -0400 Subject: [R] SAPPLY function XXXX Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From nilaya.sharma at gmail.com Wed May 4 17:37:19 2011 From: nilaya.sharma at gmail.com (Nilaya Sharma) Date: Wed, 4 May 2011 11:37:19 -0400 Subject: [R] Fwd: simple question In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From reith_william at bah.com Wed May 4 17:18:09 2011 From: reith_william at bah.com (wwreith) Date: Wed, 4 May 2011 08:18:09 -0700 (PDT) Subject: [R] Storing data from a test as a vector or matrix In-Reply-To: <1304517307398-3495626.post@n4.nabble.com> References: <1304517307398-3495626.post@n4.nabble.com> Message-ID: <1304522289989-3495901.post@n4.nabble.com> I figured out that attributes is the command that I was trying to find. It allowed me to find out that I was needing to use "stats" not "Df" or "Pillai" etc. Following command worked. > S1<-as.vector(S$stats[1,]) However when I try the same thing with summary.aov it is not working. >SA<-summary.aov(M) >SA1<-as.vector(SA$Reponse IPS1) or >SA1<-as.vector(SA$Reponse IPS1[1,]) Using attributes command I get " Response IPS1". I have tried several variations like including the first space in the quotes, deleting the space between the two words, adding in [1,], etc. The error is stating unexpected text. I have even tried using " " which works for the stats line above but does not work here. Again thanks for any suggestions on what I am not understanding here. -- View this message in context: http://r.789695.n4.nabble.com/Storing-data-from-a-test-as-a-vector-or-matrix-tp3495626p3495901.html Sent from the R help mailing list archive at Nabble.com. From reith_william at bah.com Wed May 4 17:31:47 2011 From: reith_william at bah.com (wwreith) Date: Wed, 4 May 2011 08:31:47 -0700 (PDT) Subject: [R] Storing data from a test as a vector or matrix In-Reply-To: <4DC1683D.2030209@uni-hamburg.de> References: <1304517307398-3495626.post@n4.nabble.com> <4DC1683D.2030209@uni-hamburg.de> Message-ID: <1304523107788-3495950.post@n4.nabble.com> SA gives the output: Response IPS1 : Df Sum Sq Mean Sq F value Pr(>F) as.factor(WSD) 3 3.3136 1.10455 23.047 5.19e-12 *** Residuals 129 6.1823 0.04793 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 . . . There are 11 more just like this output. Just increment IPS1 to IPS2, etc. Goal: save "3 3.3136 1.10455 23.047 5.19e-12" as a vector. Str(SA) gives the output: str(SA) > str(SA) List of 12 " $ Response IPS1 :Classes 'anova' and 'data.frame': 2 obs. of 5 variables:" ..$ Df : num [1:2] 3 129 ..$ Sum Sq : num [1:2] 3.31 6.18 ..$ Mean Sq: num [1:2] 1.1045 0.0479 ..$ F value: num [1:2] 23 NA ..$ Pr(>F) : num [1:2] 5.19e-12 NA There are several more but they are just repeats of this one only with IPS2, IPS3,... The command: > SA1<-as.vector(SA$"Reponse IPS1") Returns >NULL As do several variations I have tried. Any ideas. -- View this message in context: http://r.789695.n4.nabble.com/Storing-data-from-a-test-as-a-vector-or-matrix-tp3495626p3495950.html Sent from the R help mailing list archive at Nabble.com. From Steve_Friedman at nps.gov Wed May 4 21:54:29 2011 From: Steve_Friedman at nps.gov (Steve_Friedman at nps.gov) Date: Wed, 4 May 2011 15:54:29 -0400 Subject: [R] join tables in R In-Reply-To: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B301@USEXCPWM07.mc01.unicreditgroup.eu> Message-ID: Look at the merge command ?merge Steve Friedman Ph. D. Ecologist / Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 Steve_Friedman at nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147 "Roccato Alfredo (UniCredit)" "r-help at r-project.org" Sent by: r-help-bounces at r- cc project.org Subject [R] join tables in R 05/04/2011 10:32 AM I'd to match-merge 2 tables in such a manner that I keep all the rows in table 1, but not the rows that are in both table 1 and 2. Thank you for your help, Alfredo > master <- data.frame(ID=2001:2011) > train <- data.frame(ID=2004:2006) > valid <- ??? in this example table valid should have the following > str(valid) Year: int 2001 2002 2003 2007 2008 2009 2010 2011 in SAS I'd do the following: data master; do id=2001 to 2011; output; end; run; data train; do id=2004 to 2006; output; end; run; data valid; merge master(in=a) train(in=b); by id; if a and not b; run; and in SQL: create table valid as select a.* from master where ID not in (select ID from train) [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From wwwhsd at gmail.com Wed May 4 22:00:49 2011 From: wwwhsd at gmail.com (Henrique Dallazuanna) Date: Wed, 4 May 2011 17:00:49 -0300 Subject: [R] Regexp question In-Reply-To: References: Message-ID: Try this: gsub(",\\s+FROM", " FROM", st) On Wed, May 4, 2011 at 4:41 PM, johannes rara wrote: > I have a string like this > > st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), > COUNT(salary), FROM Employees" > > How can I remove the last comma before the FROM statement? > > -J > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O From mailinglist.honeypot at gmail.com Wed May 4 22:02:18 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Wed, 4 May 2011 16:02:18 -0400 Subject: [R] Saving Values in a Vector from a For Loop In-Reply-To: <1304518779740-3495714.post@n4.nabble.com> References: <1304518779740-3495714.post@n4.nabble.com> Message-ID: Hi, On Wed, May 4, 2011 at 10:19 AM, blutack wrote: > Hi, > I have a created a function, but now I need to call it about a hundred times > and store the results as a vector. > I think doing a for loop would work, but I cant work out how to save the > values generated from the function as a vector. Any ideas? R> n.times <- 100 R> result <- numeric(n.times) ## assuming your function returns numeric R> for (i in 1:n.times) { result[i] <- myfunction(...) } or R> result <- replicate(n.times, myfunction(...)) or if you need the index R> result <- sapply(seq(n.times), function(i) myfunction(i, ...)) I guess you get the idea ... -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From savicky at praha1.ff.cuni.cz Wed May 4 22:13:35 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Wed, 4 May 2011 22:13:35 +0200 Subject: [R] join tables in R In-Reply-To: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B301@USEXCPWM07.mc01.unicreditgroup.eu> References: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B301@USEXCPWM07.mc01.unicreditgroup.eu> Message-ID: <20110504201335.GA22684@praha1.ff.cuni.cz> On Wed, May 04, 2011 at 04:32:14PM +0200, Roccato Alfredo (UniCredit) wrote: > I'd to match-merge 2 tables in such a manner that I keep all the rows in table 1, but not the rows that are in both table 1 and 2. > Thank you for your help, > Alfredo > > > master <- data.frame(ID=2001:2011) > > train <- data.frame(ID=2004:2006) > > valid <- ??? > > in this example table valid should have the following > > > str(valid) > Year: int 2001 2002 2003 2007 2008 2009 2010 2011 Hi. Try the following, which assumes that "train" is a subset of "master". master <- data.frame(ID=2001:2011) train <- data.frame(ID=2004:2006) valid <- master[! (master[, 1] %in% train[ ,1]), , drop=FALSE] Hope this helps. Petr Savicky. From mailinglist.honeypot at gmail.com Wed May 4 22:19:43 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Wed, 4 May 2011 16:19:43 -0400 Subject: [R] join tables in R In-Reply-To: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B301@USEXCPWM07.mc01.unicreditgroup.eu> References: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B301@USEXCPWM07.mc01.unicreditgroup.eu> Message-ID: Hi, On Wed, May 4, 2011 at 10:32 AM, Roccato Alfredo (UniCredit) wrote: > I'd to match-merge 2 tables in such a manner that I keep all the rows in table 1, but not the rows that are in both table 1 and 2. > Thank you for your help, > Alfredo > >> master <- data.frame(ID=2001:2011) >> train ? <- data.frame(ID=2004:2006) >> valid <- ??? > > in this example table valid should have the following > >> str(valid) > ?Year: int ?2001 2002 2003 2007 2008 2009 2010 2011 Are you working with only one column at a time? If so: R> keep <- !(master$ID %in% train$ID) R> valid <- master[keep,] If you are working with combinations of columns as the keys for each row, there are other ways ... > in SAS I'd do the following: > data master; do id=2001 to 2011; output; end; run; > data train; do id=2004 to 2006; output; end; run; > data valid; merge master(in=a) train(in=b); by id; if a and not b; run; > > and in SQL: > create table valid as > ?select a.* from master where ID not in (select ID from train) My solution does pretty much what this select statement would do. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From eriki at ccbr.umn.edu Wed May 4 22:22:28 2011 From: eriki at ccbr.umn.edu (Erik Iverson) Date: Wed, 04 May 2011 15:22:28 -0500 Subject: [R] SAPPLY function XXXX In-Reply-To: References: Message-ID: <4DC1B584.6040201@ccbr.umn.edu> Dan, > I am attempting to write a function to count the number of non-missing > values of each column in a data frame using the sapply function. I have the > following code which is receiving the error message below. > > >> n.valid<-sapply(data1,sum(!is.na)) > Error in !is.na : invalid argument type That's the FUN argument to sapply, which expects a function. is.na is indeed a function, but !is.na is not a function: > !is.na Error in !is.na : invalid argument type You need to write your own function to do what you want. Luckily this is easy. Let's write one to count the number of missing values in a vector. countNAs <- function(x) { sum(!is.na(x)) } Now you have a function that does what you want, so you can use sapply with it. sapply(data1, countNAs) You could also do an anonymous (unnamed) function within sapply to the same effect. sapply(data1, function(x) sum(!is.na(x))) NB: none of this is tested! --Erik > > Ultimately, I would like for this to be 1 conponent in a larger function > that will produce PROC CONTENTS style output. Something like... > > data1.contents<-data.frame(Variable=names(data1), > Class=sapply(data1,class), > n.valid=sapply(data1,sum(!is.na)), > n.miss=sapply(data1,sum(is.na))) > data1.contents > > Any suggestions/assistance are appreciated. > > Thank you, > > Daniel > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From eriki at ccbr.umn.edu Wed May 4 22:26:59 2011 From: eriki at ccbr.umn.edu (Erik Iverson) Date: Wed, 04 May 2011 15:26:59 -0500 Subject: [R] SAPPLY function XXXX In-Reply-To: References: Message-ID: <4DC1B693.9020508@ccbr.umn.edu> > Ultimately, I would like for this to be 1 conponent in a larger function > that will produce PROC CONTENTS style output. Something like... > > data1.contents<-data.frame(Variable=names(data1), > Class=sapply(data1,class), > n.valid=sapply(data1,sum(!is.na)), > n.miss=sapply(data1,sum(is.na))) > data1.contents Also meant to mention to see ?describe in the Hmisc package: E.g., > describe(c(NA, 1:10)) There is also a useful method for data.frame objects. --Erik From jerome.asselin.stat at gmail.com Wed May 4 22:31:29 2011 From: jerome.asselin.stat at gmail.com (Jerome Asselin) Date: Wed, 4 May 2011 16:31:29 -0400 Subject: [R] Regexp question In-Reply-To: References: Message-ID: <1304541089.5275.47.camel@localhost> On Wed, 2011-05-04 at 22:41 +0300, johannes rara wrote: > I have a string like this > > st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), > COUNT(salary), FROM Employees" > > How can I remove the last comma before the FROM statement? gsub(",[^,]*FROM ", " FROM ", st) HTH, Jerome From r at catwhisker.org Wed May 4 22:33:16 2011 From: r at catwhisker.org (David Wolfskill) Date: Wed, 4 May 2011 13:33:16 -0700 Subject: [R] Regexp question In-Reply-To: References: Message-ID: <20110504203316.GF2037@albert.catwhisker.org> On Wed, May 04, 2011 at 10:41:36PM +0300, johannes rara wrote: > I have a string like this > > st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), > COUNT(salary), FROM Employees" > > How can I remove the last comma before the FROM statement? This doesn't use a regex, per se, but: > st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary), FROM Employees" > st [1] "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary), FROM Employees" > sub(", FROM", " FROM", st) [1] "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary) FROM Employees" > I'm not sure that's what you had in mind, though. Peace, david -- David H. Wolfskill r at catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available URL: From jeremy.miles at gmail.com Wed May 4 22:33:51 2011 From: jeremy.miles at gmail.com (Jeremy Miles) Date: Wed, 4 May 2011 13:33:51 -0700 Subject: [R] best subset regression in R In-Reply-To: <792436.61050.qm@web38308.mail.mud.yahoo.com> References: <792436.61050.qm@web38308.mail.mud.yahoo.com> Message-ID: On 4 May 2011 09:47, FMH wrote: > Dear All, > > Could someone please give some advice the way to do linear modelling via best subset regression in R? I'd really appreciate for your kindness. > Google is your friend here: http://www.google.com/search?q=best+subsets+regression+R , and sends me to this page: http://www.statmethods.net/stats/regression.html Jeremy -- Jeremy Miles Support Dan and Alex's school: Vote for Goethe Charter School to receive a grant from Pepsi to help build a library: http://www.refresheverything.com/gicslibrary From ivan.calandra at uni-hamburg.de Wed May 4 22:38:03 2011 From: ivan.calandra at uni-hamburg.de (Ivan Calandra) Date: Wed, 4 May 2011 22:38:03 +0200 Subject: [R] Str info. Thanks for helping In-Reply-To: <5188382.61183.1304533027791.JavaMail.nabble@joe.nabble.com> References: <5188382.61183.1304533027791.JavaMail.nabble@joe.nabble.com> Message-ID: It looks from str(SA) that Response IPS1 is a data.frame of class "anova", which probably cannot be coerced to vector. Maybe you can use unlist() instead of as.vector() Or something like SA[["Response IPS1"]]["as.factor(WSD)",] ## to select the first row only, even maybe with unlist() Without a better REPRODUCIBLE example, I cannot tell more (maybe some others can, that's why I reply to the list) HTH, Ivan Le 4 mai 2011 ? 20:17, reith_william at bah.com a ?crit : > I am still waiting for this to get posted so I thought I would email it to you. > > SA gives the output: > > Response IPS1 : > Df Sum Sq Mean Sq F value Pr(>F) > as.factor(WSD) 3 3.3136 1.10455 23.047 5.19e-12 *** > Residuals 129 6.1823 0.04793 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > . > . > . > There are 11 more just like this output. Just increment IPS1 to IPS2, etc. > > > Goal: save "3 3.3136 1.10455 23.047 5.19e-12" as a vector. > > > Str(SA) gives the output: > > str(SA) >> str(SA) > List of 12 > " $ Response IPS1 :Classes 'anova' and 'data.frame': 2 obs. of 5 variables:" > ..$ Df : num [1:2] 3 129 > ..$ Sum Sq : num [1:2] 3.31 6.18 > ..$ Mean Sq: num [1:2] 1.1045 0.0479 > ..$ F value: num [1:2] 23 NA > ..$ Pr(>F) : num [1:2] 5.19e-12 NA > > > There are several more but they are just repeats of this one only with IPS2, IPS3,... > > The command: > >> SA1<-as.vector(SA$"Reponse IPS1") > > Returns > >> NULL > > As do several variations I have tried. Any ideas. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Institut und Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php From patrick.breheny at uky.edu Wed May 4 22:40:39 2011 From: patrick.breheny at uky.edu (Patrick Breheny) Date: Wed, 4 May 2011 16:40:39 -0400 Subject: [R] Box-Cox transformation in R In-Reply-To: <435617.42947.qm@web38305.mail.mud.yahoo.com> References: <435617.42947.qm@web38305.mail.mud.yahoo.com> Message-ID: <4DC1B9C7.3080304@uky.edu> On 05/04/2011 12:53 PM, FMH wrote: > Hi, > > Could any one please help how I can transform data based on Box-Cox Transformations in R. > > Any helps will be much appreciated. > > thanks, > Kagba > [[alternative HTML version deleted]] > See the boxcox function in the MASS package. _______________________ Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky From johannesraja at gmail.com Wed May 4 22:41:34 2011 From: johannesraja at gmail.com (johannes rara) Date: Wed, 4 May 2011 23:41:34 +0300 Subject: [R] Regexp question In-Reply-To: <20110504203316.GF2037@albert.catwhisker.org> References: <20110504203316.GF2037@albert.catwhisker.org> Message-ID: Thank you all! 2011/5/4 David Wolfskill : > On Wed, May 04, 2011 at 10:41:36PM +0300, johannes rara wrote: >> I have a string like this >> >> st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), >> COUNT(salary), FROM Employees" >> >> How can I remove the last comma before the FROM statement? > > This doesn't use a regex, per se, but: > >> st <- "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary), FROM Employees" >> st > [1] "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary), FROM Employees" >> sub(", FROM", " FROM", st) > [1] "SELECT COUNT(empid), COUNT(mgrid), COUNT(empname), COUNT(salary) FROM Employees" >> > > I'm not sure that's what you had in mind, though. > > Peace, > david > -- > David H. Wolfskill ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?r at catwhisker.org > Depriving a girl or boy of an opportunity for education is evil. > > See http://www.catwhisker.org/~david/publickey.gpg for my public key. > From andrew.decker.steen at gmail.com Wed May 4 22:43:09 2011 From: andrew.decker.steen at gmail.com (Andrew D. Steen) Date: Wed, 4 May 2011 22:43:09 +0200 Subject: [R] what happens when I store linear models in an array? In-Reply-To: <69089E95-E117-4C0E-92FB-3619677DB338@comcast.net> References: <4dc13d51.c9860e0a.331c.2bf6@mx.google.com> <69089E95-E117-4C0E-92FB-3619677DB338@comcast.net> Message-ID: <4dc1ba56.41cad80a.066d.5bef@mx.google.com> Thanks all, this is very helpful. --Andrew Steen > -----Original Message----- > From: David Winsemius [mailto:dwinsemius at comcast.net] > Sent: Wednesday, May 04, 2011 4:35 PM > To: Andrew D. Steen > Cc: r-help at r-project.org > Subject: Re: [R] what happens when I store linear models in an array? > > > On May 4, 2011, at 4:49 AM, Andrew D. Steen wrote: > > > I've got a bunch of similar datasets, all of which I've fit to linear > > models. I'd like to easily create arrays of a specific parameter > > from each > > linear model (e.g., all of the intercepts in one array). I figured > > I'd put > > the model objects into an array, and then (somehow) I could easily > > create > > corresponding arrays of intercepts or residuals or whatever, but I > > can't the > > parameters back out. > > > > Right now I've stored the model objects in a 2-D array: > >> lms.ASP <- array(list(), c(3,4)) > > > > Then I fill the array element-by-element: > >> surf105.lm. ASP <- lm(ASP ~ time) > >> lms.ASP[1,1] <- list(surf105.lm.ASP) > > > > Something is successfully being stored in the array: > >> test <- lms.tx.ASP[1,1] > >> test > > [[1]] > > Call: > > lm(formula = ASP ~ time) > > Coefficients: > > (Intercept) elapsed.time..hr > > 0.430732 0.004073 > > > > But I can't seem to call extraction functions on the linear models: > >> fitted(lms.ASP[1,1]) > > NUL > > > > It seems like something less than the actual linear model object is > > being > > stored in the array, but I don't understand what's happening, or how > > to > > easily batch-extract parameters of linear models. Any advice? > > > > The problem is that the "[" function is returning a sublist from that > array of lists, which is still a list. You wanted the contents of the > first (and only) element of that list and Andrew Robinson offered you > the solution. > > -- > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT From rohitpandey576 at gmail.com Wed May 4 23:52:55 2011 From: rohitpandey576 at gmail.com (Rohit Pandey) Date: Thu, 5 May 2011 03:22:55 +0530 Subject: [R] help with the maxBHHH routine In-Reply-To: <20110503231408.GQ48756@ms.unimelb.edu.au> References: <20110503231408.GQ48756@ms.unimelb.edu.au> Message-ID: Hi Andrew, Ravi and Arne, Thank you so much for your prompt replies. I see that all of you mention the need for simple, reproducible code. I had thought of doing this, but the functions I was using for the observation level gradient and likelihood function were very long. I will paste them below here. Also, sorry for the ambiguity with the "1000's of observations and 821 parameters" on the one hand and the 10 * 2 matrix on the other. The latter is a toy data set and the former is the real data set I ultimately hope to apply this routine to once it works. Also, sorry for not mentioning the fact that the maxBHHH function I am using is from the maxLik package (thanks, Ravi for pointing out). So, the code that is giving me the errors is: maxBHHH(logLikALS4,grad=nuGradientC4,finalHessian="BHHH",start=prm,iterlim=2) and maxBHHH(logLikALS4,grad=nuGradientC4,finalHessian="BHHH",start=prm,iterlim=2) Where nuGradientC4 returns a 2*10 matrix and nuGradientC5 a 10*2 matrix (there are 10 parameters and 2 observations). I have attached the required functions in the .R file. These make for some pretty long code, but all you have to do is either load the file or paste the contents into your R console (and maybe see that they're returning what they're supposed to). I'm sorry I couldn't think of a way to come up with a shorter version of this code (I tried my best). Once you load the file, you should see the following: #The observation level likelihood function > logLikALS4(prm) 1 2 -0.6931472 -0.6931472 #The observation level gradients > nuGradientC4(prm) 1 2 3 4 5 6 7 8 9 10 2 -0.3518519 0.3518519 0.0000000 0 -0.1481481 -0.1666667 0.1481481 0.1666667 0.0000000 0.0000000 4 0.0000000 -0.3518519 0.3518519 0 0.0000000 0.0000000 -0.1666667 -0.1481481 0.1666667 0.1481481 Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' > nuGradientC5(prm) 2 4 1 -0.3518519 0.0000000 2 0.3518519 -0.3518519 3 0.0000000 0.3518519 4 0.0000000 0.0000000 5 -0.1481481 0.0000000 6 -0.1666667 0.0000000 7 0.1481481 -0.1666667 8 0.1666667 -0.1481481 9 0.0000000 0.1666667 10 0.0000000 0.1481481 Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' Ignore the warning messages. The errors are: > maxBHHH(logLikALS4,grad=nuGradientC4,finalHessian="BHHH",start=prm,iterlim=2) Error in checkBhhhGrad(g = gr, theta = theta, analytic = (!is.null(attr(f, : the matrix returned by the gradient function (argument 'grad') must have at least as many rows as the number of parameters (10), where each row must correspond to the gradients of the log-likelihood function of an individual (independent) observation: currently, there are (is) 10 parameter(s) but the gradient matrix has only 2 row(s) In addition: Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' and: > maxBHHH(logLikALS4,grad=nuGradientC5,finalHessian="BHHH",start=prm,iterlim=2) Error in gr[, fixed] <- NA : (subscript) logical subscript too long In addition: Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' Again, thanks for your patience and help. Rohit On Wed, May 4, 2011 at 4:44 AM, Andrew Robinson < A.Robinson at ms.unimelb.edu.au> wrote: > I suggest that you provide some commented, minimal, self-contained, > reproducible code. > > Cheers > > Andrew > > On Wed, May 04, 2011 at 02:23:29AM +0530, Rohit Pandey wrote: > > Hello R community, > > > > I have been using R's inbuilt maximum likelihood functions, for the > > different methods (NR, BFGS, etc). > > > > I have figured out how to use all of them except the maxBHHH function. > This > > one is different from the others as it requires an observation level > > gradient. > > > > I am using the following syntax: > > > > maxBHHH(logLik,grad=nuGradient,finalHessian="BHHH",start=prm,iterlim=2) > > > > where logLik is the likelihood function and returns a vector of > observation > > level likelihoods and nuGradient is a function that returns a matrix with > > each row corresponding to a single observation and the columns > corresponding > > to the gradient values for each parameter (as is mentioned in the online > > help). > > > > however, this gives me the following error: > > > > *Error in checkBhhhGrad(g = gr, theta = theta, analytic = > (!is.null(attr(f, > > : > > the matrix returned by the gradient function (argument 'grad') must > have > > at least as many rows as the number of parameters (10), where each row > must > > correspond to the gradients of the log-likelihood function of an > individual > > (independent) observation: > > currently, there are (is) 10 parameter(s) but the gradient matrix has > only > > 2 row(s) > > * > > It seems it is expecting as many rows as there are parameters. So, I > changed > > my likelihood function so that it would return the transpose of the > earlier > > matrix (hence returning a matrix with rows equaling parameters and > columns, > > observations). > > > > However, when I run the function again, I still get an error: > > *Error in gr[, fixed] <- NA : (subscript) logical subscript too long* > > > > I have verified that my gradient function, when summed across > observations > > gives the same results as the in built numerical gradient (to the 11th > > decimal place - after that, they differ since R's function is numerical). > > > > I am trying to run a very large estimation (1000's of observations and > 821 > > parameters) and all of the other methods are taking way too much time > > (days). This method is our last hope and so, any help will be greatly > > appreciated. > > > > -- > > Thanks in advance, > > Rohit > > Mob: 91 9819926213 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Program Manager, ACERA > Department of Mathematics and Statistics Tel: +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia (prefer email) > http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 > http://www.acera.unimelb.edu.au/ > > Forest Analytics with R (Springer, 2011) > http://www.ms.unimelb.edu.au/FAwR/ > Introduction to Scientific Programming and Simulation using R (CRC, 2009): > http://www.ms.unimelb.edu.au/spuRs/ > -- Thanks, Rohit Mob: 91 9819926213 From dan.abner99 at gmail.com Wed May 4 22:28:28 2011 From: dan.abner99 at gmail.com (Dan Abner) Date: Wed, 4 May 2011 16:28:28 -0400 Subject: [R] SAPPLY function XXXX In-Reply-To: <4DC1B584.6040201@ccbr.umn.edu> References: <4DC1B584.6040201@ccbr.umn.edu> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From david.j.meehan at gmail.com Wed May 4 23:42:53 2011 From: david.j.meehan at gmail.com (Rovinpiper) Date: Wed, 4 May 2011 14:42:53 -0700 (PDT) Subject: [R] ANOVA 1 too few degrees of freedom In-Reply-To: <1304458392408-3493632.post@n4.nabble.com> References: <1304451451151-3493349.post@n4.nabble.com> <1304458392408-3493632.post@n4.nabble.com> Message-ID: <1304545373444-3496870.post@n4.nabble.com> This response went to my email: Without your data it's hard to say, but one possibility is that your plots are nested within treatments instead of crossed, or that you have something rather more cunning going on involving the Days. For example if you had 8 days for six of your plots and another 8 days for the remaining 6 plots, you may find that the total degrees of freedom aren't quite what you expected, as those subgroups need an intercept each. (I had this for a replicated latin square design - but I have to say that my problem then was that lm() gavbe me too many df and an apparently unbalanced design - I had to add the superset factor manually to get it right) Another possibility is that one of your plots has no data; try table(Combined.Plot) to check. -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-1-too-few-degrees-of-freedom-tp3493349p3496870.html Sent from the R help mailing list archive at Nabble.com. From david.j.meehan at gmail.com Wed May 4 23:43:44 2011 From: david.j.meehan at gmail.com (Rovinpiper) Date: Wed, 4 May 2011 14:43:44 -0700 (PDT) Subject: [R] ANOVA 1 too few degrees of freedom In-Reply-To: <1304545373444-3496870.post@n4.nabble.com> References: <1304451451151-3493349.post@n4.nabble.com> <1304458392408-3493632.post@n4.nabble.com> <1304545373444-3496870.post@n4.nabble.com> Message-ID: <1304545424111-3496871.post@n4.nabble.com> And I responded as follows: Hi, Thanks for your advice. I tried using table() to check for missing data. Here are the results: > table(Combined.Plot) Combined.Plot 60m A1 B1 B3 B4 C5 C9 D2 D9 F60m F8 Q7 34 34 34 34 34 34 34 34 34 34 34 34 > table(Combined.Day) Combined.Day 1 2.5 7.5 8.5 10.5 12.5 14.5 17.5 18.5 19.5 21.5 24.5 29.5 32.5 37.5 50.5 79.5 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 So this seems to indicate that I have what I want. I have two respiration data points at each plot on each day. I set this data up so that I could make comparisons of all of the samples on the same day. That's why I have numbers like 2.5. I also made a table for comparison of Combined.Day and Combined.Plot. > cbind(Combined.Day, Combined.Plot) Combined.Day Combined.Plot 1 "1" "60m" 2.5 "2.5" "60m" 7.5 "7.5" "60m" 8.5 "8.5" "60m" 10.5 "10.5" "60m" 12.5 "12.5" "60m" 14.5 "14.5" "60m" 17.5 "17.5" "60m" 18.5 "18.5" "60m" 19.5 "19.5" "60m" 21.5 "21.5" "60m" 24.5 "24.5" "60m" 29.5 "29.5" "60m" 32.5 "32.5" "60m" 37.5 "37.5" "60m" 50.5 "50.5" "60m" 79.5 "79.5" "60m" 1 "1" "A1" 2.5 "2.5" "A1" 7.5 "7.5" "A1" 8.5 "8.5" "A1" 10.5 "10.5" "A1" 12.5 "12.5" "A1" 14.5 "14.5" "A1" 17.5 "17.5" "A1" 18.5 "18.5" "A1" 19.5 "19.5" "A1" 21.5 "21.5" "A1" 24.5 "24.5" "A1" 29.5 "29.5" "A1" 32.5 "32.5" "A1" 37.5 "37.5" "A1" 50.5 "50.5" "A1" 79.5 "79.5" "A1" 1 "1" "B1" 2.5 "2.5" "B1" 7.5 "7.5" "B1" 8.5 "8.5" "B1" 10.5 "10.5" "B1" 12.5 "12.5" "B1" 14.5 "14.5" "B1" 17.5 "17.5" "B1" 18.5 "18.5" "B1" 19.5 "19.5" "B1" 21.5 "21.5" "B1" 24.5 "24.5" "B1" 29.5 "29.5" "B1" 32.5 "32.5" "B1" 37.5 "37.5" "B1" 50.5 "50.5" "B1" 79.5 "79.5" "B1" 1 "1" "B3" 2.5 "2.5" "B3" 7.5 "7.5" "B3" 8.5 "8.5" "B3" 10.5 "10.5" "B3" 12.5 "12.5" "B3" 14.5 "14.5" "B3" 17.5 "17.5" "B3" 18.5 "18.5" "B3" 19.5 "19.5" "B3" 21.5 "21.5" "B3" 24.5 "24.5" "B3" 29.5 "29.5" "B3" 32.5 "32.5" "B3" 37.5 "37.5" "B3" 50.5 "50.5" "B3" 79.5 "79.5" "B3" 1 "1" "B4" 2.5 "2.5" "B4" 7.5 "7.5" "B4" 8.5 "8.5" "B4" 10.5 "10.5" "B4" 12.5 "12.5" "B4" 14.5 "14.5" "B4" 17.5 "17.5" "B4" 18.5 "18.5" "B4" 19.5 "19.5" "B4" 21.5 "21.5" "B4" 24.5 "24.5" "B4" 29.5 "29.5" "B4" 32.5 "32.5" "B4" 37.5 "37.5" "B4" 50.5 "50.5" "B4" 79.5 "79.5" "B4" 1 "1" "C5" 2.5 "2.5" "C5" 7.5 "7.5" "C5" 8.5 "8.5" "C5" 10.5 "10.5" "C5" 12.5 "12.5" "C5" 14.5 "14.5" "C5" 17.5 "17.5" "C5" 18.5 "18.5" "C5" 19.5 "19.5" "C5" 21.5 "21.5" "C5" 24.5 "24.5" "C5" 29.5 "29.5" "C5" 32.5 "32.5" "C5" 37.5 "37.5" "C5" 50.5 "50.5" "C5" 79.5 "79.5" "C5" 1 "1" "C9" 2.5 "2.5" "C9" 7.5 "7.5" "C9" 8.5 "8.5" "C9" 10.5 "10.5" "C9" 12.5 "12.5" "C9" 14.5 "14.5" "C9" 17.5 "17.5" "C9" 18.5 "18.5" "C9" 19.5 "19.5" "C9" 21.5 "21.5" "C9" 24.5 "24.5" "C9" 29.5 "29.5" "C9" 32.5 "32.5" "C9" 37.5 "37.5" "C9" 50.5 "50.5" "C9" 79.5 "79.5" "C9" 1 "1" "D2" 2.5 "2.5" "D2" 7.5 "7.5" "D2" 8.5 "8.5" "D2" 10.5 "10.5" "D2" 12.5 "12.5" "D2" 14.5 "14.5" "D2" 17.5 "17.5" "D2" 18.5 "18.5" "D2" 19.5 "19.5" "D2" 21.5 "21.5" "D2" 24.5 "24.5" "D2" 29.5 "29.5" "D2" 32.5 "32.5" "D2" 37.5 "37.5" "D2" 50.5 "50.5" "D2" 79.5 "79.5" "D2" 1 "1" "D9" 2.5 "2.5" "D9" 7.5 "7.5" "D9" 8.5 "8.5" "D9" 10.5 "10.5" "D9" 12.5 "12.5" "D9" 14.5 "14.5" "D9" 17.5 "17.5" "D9" 18.5 "18.5" "D9" 19.5 "19.5" "D9" 21.5 "21.5" "D9" 24.5 "24.5" "D9" 29.5 "29.5" "D9" 32.5 "32.5" "D9" 37.5 "37.5" "D9" 50.5 "50.5" "D9" 79.5 "79.5" "D9" 1 "1" "F60m" 2.5 "2.5" "F60m" 7.5 "7.5" "F60m" 8.5 "8.5" "F60m" 10.5 "10.5" "F60m" 12.5 "12.5" "F60m" 14.5 "14.5" "F60m" 17.5 "17.5" "F60m" 18.5 "18.5" "F60m" 19.5 "19.5" "F60m" 21.5 "21.5" "F60m" 24.5 "24.5" "F60m" 29.5 "29.5" "F60m" 32.5 "32.5" "F60m" 37.5 "37.5" "F60m" 50.5 "50.5" "F60m" 79.5 "79.5" "F60m" 1 "1" "F8" 2.5 "2.5" "F8" 7.5 "7.5" "F8" 8.5 "8.5" "F8" 10.5 "10.5" "F8" 12.5 "12.5" "F8" 14.5 "14.5" "F8" 17.5 "17.5" "F8" 18.5 "18.5" "F8" 19.5 "19.5" "F8" 21.5 "21.5" "F8" 24.5 "24.5" "F8" 29.5 "29.5" "F8" 32.5 "32.5" "F8" 37.5 "37.5" "F8" 50.5 "50.5" "F8" 79.5 "79.5" "F8" 1 "1" "Q7" 2.5 "2.5" "Q7" 7.5 "7.5" "Q7" 8.5 "8.5" "Q7" 10.5 "10.5" "Q7" 12.5 "12.5" "Q7" 14.5 "14.5" "Q7" 17.5 "17.5" "Q7" 18.5 "18.5" "Q7" 19.5 "19.5" "Q7" 21.5 "21.5" "Q7" 24.5 "24.5" "Q7" 29.5 "29.5" "Q7" 32.5 "32.5" "Q7" 37.5 "37.5" "Q7" 50.5 "50.5" "Q7" 79.5 "79.5" "Q7" 1 "1" "60m" 2.5 "2.5" "60m" 7.5 "7.5" "60m" 8.5 "8.5" "60m" 10.5 "10.5" "60m" 12.5 "12.5" "60m" 14.5 "14.5" "60m" 17.5 "17.5" "60m" 18.5 "18.5" "60m" 19.5 "19.5" "60m" 21.5 "21.5" "60m" 24.5 "24.5" "60m" 29.5 "29.5" "60m" 32.5 "32.5" "60m" 37.5 "37.5" "60m" 50.5 "50.5" "60m" 79.5 "79.5" "60m" 1 "1" "A1" 2.5 "2.5" "A1" 7.5 "7.5" "A1" 8.5 "8.5" "A1" 10.5 "10.5" "A1" 12.5 "12.5" "A1" 14.5 "14.5" "A1" 17.5 "17.5" "A1" 18.5 "18.5" "A1" 19.5 "19.5" "A1" 21.5 "21.5" "A1" 24.5 "24.5" "A1" 29.5 "29.5" "A1" 32.5 "32.5" "A1" 37.5 "37.5" "A1" 50.5 "50.5" "A1" 79.5 "79.5" "A1" 1 "1" "B1" 2.5 "2.5" "B1" 7.5 "7.5" "B1" 8.5 "8.5" "B1" 10.5 "10.5" "B1" 12.5 "12.5" "B1" 14.5 "14.5" "B1" 17.5 "17.5" "B1" 18.5 "18.5" "B1" 19.5 "19.5" "B1" 21.5 "21.5" "B1" 24.5 "24.5" "B1" 29.5 "29.5" "B1" 32.5 "32.5" "B1" 37.5 "37.5" "B1" 50.5 "50.5" "B1" 79.5 "79.5" "B1" 1 "1" "B3" 2.5 "2.5" "B3" 7.5 "7.5" "B3" 8.5 "8.5" "B3" 10.5 "10.5" "B3" 12.5 "12.5" "B3" 14.5 "14.5" "B3" 17.5 "17.5" "B3" 18.5 "18.5" "B3" 19.5 "19.5" "B3" 21.5 "21.5" "B3" 24.5 "24.5" "B3" 29.5 "29.5" "B3" 32.5 "32.5" "B3" 37.5 "37.5" "B3" 50.5 "50.5" "B3" 79.5 "79.5" "B3" 1 "1" "B4" 2.5 "2.5" "B4" 7.5 "7.5" "B4" 8.5 "8.5" "B4" 10.5 "10.5" "B4" 12.5 "12.5" "B4" 14.5 "14.5" "B4" 17.5 "17.5" "B4" 18.5 "18.5" "B4" 19.5 "19.5" "B4" 21.5 "21.5" "B4" 24.5 "24.5" "B4" 29.5 "29.5" "B4" 32.5 "32.5" "B4" 37.5 "37.5" "B4" 50.5 "50.5" "B4" 79.5 "79.5" "B4" 1 "1" "C5" 2.5 "2.5" "C5" 7.5 "7.5" "C5" 8.5 "8.5" "C5" 10.5 "10.5" "C5" 12.5 "12.5" "C5" 14.5 "14.5" "C5" 17.5 "17.5" "C5" 18.5 "18.5" "C5" 19.5 "19.5" "C5" 21.5 "21.5" "C5" 24.5 "24.5" "C5" 29.5 "29.5" "C5" 32.5 "32.5" "C5" 37.5 "37.5" "C5" 50.5 "50.5" "C5" 79.5 "79.5" "C5" 1 "1" "C9" 2.5 "2.5" "C9" 7.5 "7.5" "C9" 8.5 "8.5" "C9" 10.5 "10.5" "C9" 12.5 "12.5" "C9" 14.5 "14.5" "C9" 17.5 "17.5" "C9" 18.5 "18.5" "C9" 19.5 "19.5" "C9" 21.5 "21.5" "C9" 24.5 "24.5" "C9" 29.5 "29.5" "C9" 32.5 "32.5" "C9" 37.5 "37.5" "C9" 50.5 "50.5" "C9" 79.5 "79.5" "C9" 1 "1" "D2" 2.5 "2.5" "D2" 7.5 "7.5" "D2" 8.5 "8.5" "D2" 10.5 "10.5" "D2" 12.5 "12.5" "D2" 14.5 "14.5" "D2" 17.5 "17.5" "D2" 18.5 "18.5" "D2" 19.5 "19.5" "D2" 21.5 "21.5" "D2" 24.5 "24.5" "D2" 29.5 "29.5" "D2" 32.5 "32.5" "D2" 37.5 "37.5" "D2" 50.5 "50.5" "D2" 79.5 "79.5" "D2" 1 "1" "D9" 2.5 "2.5" "D9" 7.5 "7.5" "D9" 8.5 "8.5" "D9" 10.5 "10.5" "D9" 12.5 "12.5" "D9" 14.5 "14.5" "D9" 17.5 "17.5" "D9" 18.5 "18.5" "D9" 19.5 "19.5" "D9" 21.5 "21.5" "D9" 24.5 "24.5" "D9" 29.5 "29.5" "D9" 32.5 "32.5" "D9" 37.5 "37.5" "D9" 50.5 "50.5" "D9" 79.5 "79.5" "D9" 1 "1" "F60m" 2.5 "2.5" "F60m" 7.5 "7.5" "F60m" 8.5 "8.5" "F60m" 10.5 "10.5" "F60m" 12.5 "12.5" "F60m" 14.5 "14.5" "F60m" 17.5 "17.5" "F60m" 18.5 "18.5" "F60m" 19.5 "19.5" "F60m" 21.5 "21.5" "F60m" 24.5 "24.5" "F60m" 29.5 "29.5" "F60m" 32.5 "32.5" "F60m" 37.5 "37.5" "F60m" 50.5 "50.5" "F60m" 79.5 "79.5" "F60m" 1 "1" "F8" 2.5 "2.5" "F8" 7.5 "7.5" "F8" 8.5 "8.5" "F8" 10.5 "10.5" "F8" 12.5 "12.5" "F8" 14.5 "14.5" "F8" 17.5 "17.5" "F8" 18.5 "18.5" "F8" 19.5 "19.5" "F8" 21.5 "21.5" "F8" 24.5 "24.5" "F8" 29.5 "29.5" "F8" 32.5 "32.5" "F8" 37.5 "37.5" "F8" 50.5 "50.5" "F8" 79.5 "79.5" "F8" 1 "1" "Q7" 2.5 "2.5" "Q7" 7.5 "7.5" "Q7" 8.5 "8.5" "Q7" 10.5 "10.5" "Q7" 12.5 "12.5" "Q7" 14.5 "14.5" "Q7" 17.5 "17.5" "Q7" 18.5 "18.5" "Q7" 19.5 "19.5" "Q7" 21.5 "21.5" "Q7" 24.5 "24.5" "Q7" 29.5 "29.5" "Q7" 32.5 "32.5" "Q7" 37.5 "37.5" "Q7" 50.5 "50.5" "Q7" 79.5 "79.5" "Q7" -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-1-too-few-degrees-of-freedom-tp3493349p3496871.html Sent from the R help mailing list archive at Nabble.com. From evastrijbis at hotmail.com Wed May 4 23:20:53 2011 From: evastrijbis at hotmail.com (blue100) Date: Wed, 4 May 2011 14:20:53 -0700 (PDT) Subject: [R] how to replace all variable values? Message-ID: <1304544053882-3496838.post@n4.nabble.com> dear all, Im a complete R newby with the following question. I have a dataset where my variable values are incorrectly numbered. it has to be something like this Where x is variable name, y=actual value which must become corresponding z-value x3 x4 x5 x3 x4 x5 y1 y3 y4 z1 z3 z4 y2 must become z2 y5 z5 The y and z variables are matched by: x1 x2 y1 z1 y2 z2 y3 z3 y4 z4 y5 z5 Anybody any suggestion? All help is very much appreciated.. -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-all-variable-values-tp3496838p3496838.html Sent from the R help mailing list archive at Nabble.com. From aikidasgupta at gmail.com Wed May 4 23:57:05 2011 From: aikidasgupta at gmail.com (Abhijit Dasgupta) Date: Wed, 04 May 2011 17:57:05 -0400 Subject: [R] combine lattice plot and standard R plot In-Reply-To: References: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> Message-ID: <4DC1CBB1.8040609@araastat.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jan.kacaba at gmail.com Wed May 4 22:02:48 2011 From: jan.kacaba at gmail.com (derek) Date: Wed, 4 May 2011 13:02:48 -0700 (PDT) Subject: [R] print elements' placings from vector Message-ID: <1304539368841-3496645.post@n4.nabble.com> Dear R, Here is a code: z=c(1,3,0.5,6,8,10,2,2,3,4,7,3) z[z>2] I dont want print the elements, but theirs placings in vector. -- View this message in context: http://r.789695.n4.nabble.com/print-elements-placings-from-vector-tp3496645p3496645.html Sent from the R help mailing list archive at Nabble.com. From pjo at cisunix.unh.edu Wed May 4 22:33:12 2011 From: pjo at cisunix.unh.edu (Paul Ossenbruggen) Date: Wed, 4 May 2011 16:33:12 -0400 Subject: [R] fGarch Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From scttchamberlain4 at gmail.com Wed May 4 21:53:28 2011 From: scttchamberlain4 at gmail.com (Scott Chamberlain) Date: Wed, 4 May 2011 14:53:28 -0500 Subject: [R] merging multiple columns from two dataframes In-Reply-To: <1304531635175-3496341.post@n4.nabble.com> References: <1304531635175-3496341.post@n4.nabble.com> Message-ID: <88462C56F34E49D5A6BC990F142B61A7@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ypriverol at gmail.com Wed May 4 22:18:03 2011 From: ypriverol at gmail.com (ypriverol) Date: Wed, 4 May 2011 13:18:03 -0700 (PDT) Subject: [R] Bigining with a Program of SVR In-Reply-To: <1304435997731-3492746.post@n4.nabble.com> References: <1304106463512-3484476.post@n4.nabble.com> <1304429881697-3492487.post@n4.nabble.com> <1304435997731-3492746.post@n4.nabble.com> Message-ID: <1304540283121-3496685.post@n4.nabble.com> How can I apply feature selection with caret and support vector regression. -- View this message in context: http://r.789695.n4.nabble.com/Bigining-with-a-Program-of-SVR-tp3484476p3496685.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Thu May 5 00:33:17 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 4 May 2011 15:33:17 -0700 Subject: [R] Box-Cox transformation in R In-Reply-To: <435617.42947.qm@web38305.mail.mud.yahoo.com> References: <435617.42947.qm@web38305.mail.mud.yahoo.com> Message-ID: Hi: Start here: library(sos) # Install first if necessary findFn('Box-Cox') This search finds 131 matches; the basic Box-Cox transformations for regression are found in the MASS and car packages. For other situations, consult the packages and functions identified from the sos search. HTH, Dennis On Wed, May 4, 2011 at 9:53 AM, FMH wrote: > Hi, > > Could any one please help how I can transform data based on Box-Cox Transformations in R. > > Any helps will be much appreciated. > > thanks, > Kagba > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From f.harrell at vanderbilt.edu Thu May 5 00:39:28 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Wed, 4 May 2011 15:39:28 -0700 (PDT) Subject: [R] best subset regression in R In-Reply-To: <792436.61050.qm@web38308.mail.mud.yahoo.com> References: <792436.61050.qm@web38308.mail.mud.yahoo.com> Message-ID: <1304548768550-3496981.post@n4.nabble.com> Beware - this approach is a statistical train wreck. Been there, done that. If all you want is "an" answer it will save you a lot of time thinking, however. Frank FMH-4 wrote: > > Dear All, > ? > Could someone please give some advice the way to do linear modelling via > best subset regression in R? I'd really appreciate for your kindness. > ? > Thanks, > Kagba > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/best-subset-regression-in-R-tp3496671p3496981.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Thu May 5 00:46:26 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 4 May 2011 15:46:26 -0700 Subject: [R] print elements' placings from vector In-Reply-To: <1304539368841-3496645.post@n4.nabble.com> References: <1304539368841-3496645.post@n4.nabble.com> Message-ID: Hi: Is this what you're after? > z=c(1,3,0.5,6,8,10,2,2,3,4,7,3) > which(z > 2) [1] 2 4 5 6 9 10 11 12 HTH, Dennis On Wed, May 4, 2011 at 1:02 PM, derek wrote: > Dear R, > > Here is a code: > z=c(1,3,0.5,6,8,10,2,2,3,4,7,3) > z[z>2] > > I dont want print the elements, but theirs placings in vector. > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/print-elements-placings-from-vector-tp3496645p3496645.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Thu May 5 00:53:46 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 4 May 2011 15:53:46 -0700 Subject: [R] how to replace all variable values? In-Reply-To: <1304544053882-3496838.post@n4.nabble.com> References: <1304544053882-3496838.post@n4.nabble.com> Message-ID: Hi: > x1 <- 1:5 > x2 <- 6:10 > x3 <- c(1, 2, 5) > x4 <- 3 > x5 <- 4 > x2[x1 %in% x3] [1] 6 7 10 > x2[x1 %in% x4] [1] 8 > x2[x1 %in% x5] [1] 9 HTH, Dennis On Wed, May 4, 2011 at 2:20 PM, blue100 wrote: > dear all, > > Im a complete R newby with the following question. > > I have a dataset where my variable values are incorrectly numbered. it has > to be something like this > > Where x is variable name, y=actual value which must become corresponding > z-value > > x3 ? ? ?x4 ? ? ?x5 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?x3 ? ? ?x4 ? ? ?x5 > y1 ? ? ?y3 ? ? ?y4 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?z1 ? ? ?z3 ? ? ?z4 > y2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?must become ? ? ? ? ? ? z2 > y5 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?z5 > > The y and z variables are matched by: > > x1 ? ? ?x2 > y1 ? ? ?z1 > y2 ? ? ?z2 > y3 ? ? ?z3 > y4 ? ? ?z4 > y5 ? ? ?z5 > > Anybody any suggestion? All help is very much appreciated.. > > -- > View this message in context: http://r.789695.n4.nabble.com/how-to-replace-all-variable-values-tp3496838p3496838.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djmuser at gmail.com Thu May 5 01:00:45 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 4 May 2011 16:00:45 -0700 Subject: [R] combine lattice plot and standard R plot In-Reply-To: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> References: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> Message-ID: Hi: If it's doable, you'll probably need the gridBase package. Fortunately, it has a nice vignette to get you started, which tells you at the end that there are limitations in compatibility between base and grid graphics (lattice is built on the latter). HTH, Dennis 2011/5/4 Lucia Ca?as : > Dear R users, > > I would like to combine lattice plot (xyplot) and standard R plot (plot and plotCI) in an unique figure. > > I use the function "par()" to combine plot and plotCI and I use the function "print()" to combine xyplot. I tried to use these functions to combine xyplot and plotCI and plots but they do not work. Does anybody know how I can do this? > > Thank you very much in advance. > > > > > Luc?a Ca??s Ferreiro > > Instituto Espa?ol de Oceanograf?a > Centro Oceanogr?fico de A coru?a > Paseo Mar?timo Alcalde Francisco V?zquez, 10 > 15001 - A Coru?a, Spain > > Tel: +34 981 218151 ?Fax: +34 981 229077 > lucia.canas at co.ieo.es > http://www.ieo.es > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From A.Robinson at ms.unimelb.edu.au Thu May 5 01:28:41 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 09:28:41 +1000 Subject: [R] help with the maxBHHH routine In-Reply-To: References: <20110503231408.GQ48756@ms.unimelb.edu.au> Message-ID: <20110504232841.GC827@ms.unimelb.edu.au> Hi Rohit, actually, the request for simple reproducible code means that you have to find the simplest possible representation of the problem. What happens if you simplify the observation level gradient and the likelihood function? Eg to trivial examples? If you still get the error, then simplify it futher. If you get the error with the simplest possible problem, then share it. If you don't , then try to figure out what the changes were that resolved the problem, and scale those back up to your original problem. Does that make sense? Cheers Andrew On Thu, May 05, 2011 at 03:22:55AM +0530, Rohit Pandey wrote: > Hi Andrew, Ravi and Arne, > > Thank you so much for your prompt replies. I see that all of you mention > the need for simple, reproducible code. I had thought of doing this, but > the functions I was using for the observation level gradient and > likelihood function were very long. I will paste them below here. > > Also, sorry for the ambiguity with the "1000's of observations and 821 > parameters" on the one hand and the 10 * 2 matrix on the other. The latter > is a toy data set and the former is the real data set I ultimately hope to > apply this routine to once it works. Also, sorry for not mentioning the > fact that the maxBHHH function I am using is from the maxLik package > (thanks, Ravi for pointing out). > So, the code that is giving me the errors is: > maxBHHH(logLikALS4,grad=nuGradientC4,finalHessian="BHHH",start=prm,iterlim=2) > and > maxBHHH(logLikALS4,grad=nuGradientC4,finalHessian="BHHH",start=prm,iterlim=2) > Where nuGradientC4 returns a 2*10 matrix and nuGradientC5 a 10*2 matrix > (there are 10 parameters and 2 observations). > I have attached the required functions in the .R file. > These make for some pretty long code, but all you have to do is either > load the file or paste the contents into your R console (and maybe see > that they're returning what they're supposed to). I'm sorry I couldn't > think of a way to come up with a shorter version of this code (I tried my > best). > > Once you load the file, you should see the following: > > #The observation level likelihood function > > logLikALS4(prm) > 1 2 > -0.6931472 -0.6931472 > > #The observation level gradients > > nuGradientC4(prm) > 1 2 3 4 5 6 7 8 9 10 > 2 -0.3518519 0.3518519 0.0000000 0 -0.1481481 -0.1666667 0.1481481 > 0.1666667 0.0000000 0.0000000 > 4 0.0000000 -0.3518519 0.3518519 0 0.0000000 0.0000000 -0.1666667 > -0.1481481 0.1666667 0.1481481 > Warning messages: > 1: In [1]is.na(x) : [2]is.na() applied to non-(list or vector) of type > 'NULL' > 2: In [3]is.na(x) : [4]is.na() applied to non-(list or vector) of type > 'NULL' > > > nuGradientC5(prm) > 2 4 > 1 -0.3518519 0.0000000 > 2 0.3518519 -0.3518519 > 3 0.0000000 0.3518519 > 4 0.0000000 0.0000000 > 5 -0.1481481 0.0000000 > 6 -0.1666667 0.0000000 > 7 0.1481481 -0.1666667 > 8 0.1666667 -0.1481481 > 9 0.0000000 0.1666667 > 10 0.0000000 0.1481481 > Warning messages: > 1: In [5]is.na(x) : [6]is.na() applied to non-(list or vector) of type > 'NULL' > 2: In [7]is.na(x) : [8]is.na() applied to non-(list or vector) of type > 'NULL' > > Ignore the warning messages. > > The errors are: > > > > maxBHHH(logLikALS4,grad=nuGradientC4,finalHessian="BHHH",start=prm,iterlim=2) > Error in checkBhhhGrad(g = gr, theta = theta, analytic = (!is.null(attr(f, > : > the matrix returned by the gradient function (argument 'grad') must have > at least as many rows as the number of parameters (10), where each row > must correspond to the gradients of the log-likelihood function of an > individual (independent) observation: > currently, there are (is) 10 parameter(s) but the gradient matrix has only > 2 row(s) > In addition: Warning messages: > 1: In [9]is.na(x) : [10]is.na() applied to non-(list or vector) of type > 'NULL' > 2: In [11]is.na(x) : [12]is.na() applied to non-(list or vector) of type > 'NULL' > > and: > > > > maxBHHH(logLikALS4,grad=nuGradientC5,finalHessian="BHHH",start=prm,iterlim=2) > Error in gr[, fixed] <- NA : (subscript) logical subscript too long > In addition: Warning messages: > 1: In [13]is.na(x) : [14]is.na() applied to non-(list or vector) of type > 'NULL' > 2: In [15]is.na(x) : [16]is.na() applied to non-(list or vector) of type > 'NULL' > > Again, thanks for your patience and help. > > Rohit > On Wed, May 4, 2011 at 4:44 AM, Andrew Robinson > <[17]A.Robinson at ms.unimelb.edu.au> wrote: > > I suggest that you provide some commented, minimal, self-contained, > reproducible code. > > Cheers > > Andrew > On Wed, May 04, 2011 at 02:23:29AM +0530, Rohit Pandey wrote: > > Hello R community, > > > > I have been using R's inbuilt maximum likelihood functions, for the > > different methods (NR, BFGS, etc). > > > > I have figured out how to use all of them except the maxBHHH function. > This > > one is different from the others as it requires an observation level > > gradient. > > > > I am using the following syntax: > > > > > maxBHHH(logLik,grad=nuGradient,finalHessian="BHHH",start=prm,iterlim=2) > > > > where logLik is the likelihood function and returns a vector of > observation > > level likelihoods and nuGradient is a function that returns a matrix > with > > each row corresponding to a single observation and the columns > corresponding > > to the gradient values for each parameter (as is mentioned in the > online > > help). > > > > however, this gives me the following error: > > > > *Error in checkBhhhGrad(g = gr, theta = theta, analytic = > (!is.null(attr(f, > > : > > the matrix returned by the gradient function (argument 'grad') must > have > > at least as many rows as the number of parameters (10), where each row > must > > correspond to the gradients of the log-likelihood function of an > individual > > (independent) observation: > > currently, there are (is) 10 parameter(s) but the gradient matrix has > only > > 2 row(s) > > * > > It seems it is expecting as many rows as there are parameters. So, I > changed > > my likelihood function so that it would return the transpose of the > earlier > > matrix (hence returning a matrix with rows equaling parameters and > columns, > > observations). > > > > However, when I run the function again, I still get an error: > > *Error in gr[, fixed] <- NA : (subscript) logical subscript too long* > > > > I have verified that my gradient function, when summed across > observations > > gives the same results as the in built numerical gradient (to the 11th > > decimal place - after that, they differ since R's function is > numerical). > > > > I am trying to run a very large estimation (1000's of observations and > 821 > > parameters) and all of the other methods are taking way too much time > > (days). This method is our last hope and so, any help will be greatly > > appreciated. > > > > -- > > Thanks in advance, > > Rohit > > Mob: 91 9819926213 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > [18]R-help at r-project.org mailing list > > [19]https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > [20]http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Program Manager, ACERA > Department of Mathematics and Statistics Tel: +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia (prefer email) > [21]http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 > [22]http://www.acera.unimelb.edu.au/ > > Forest Analytics with R (Springer, 2011) > [23]http://www.ms.unimelb.edu.au/FAwR/ > Introduction to Scientific Programming and Simulation using R (CRC, > 2009): > [24]http://www.ms.unimelb.edu.au/spuRs/ > > -- > Thanks, > Rohit > Mob: 91 9819926213 > > References > > Visible links > 1. http://is.na/ > 2. http://is.na/ > 3. http://is.na/ > 4. http://is.na/ > 5. http://is.na/ > 6. http://is.na/ > 7. http://is.na/ > 8. http://is.na/ > 9. http://is.na/ > 10. http://is.na/ > 11. http://is.na/ > 12. http://is.na/ > 13. http://is.na/ > 14. http://is.na/ > 15. http://is.na/ > 16. http://is.na/ > 17. mailto:A.Robinson at ms.unimelb.edu.au > 18. mailto:R-help at r-project.org > 19. https://stat.ethz.ch/mailman/listinfo/r-help > 20. http://www.r-project.org/posting-guide.html > 21. http://www.ms.unimelb.edu.au/%7Eandrewpr > 22. http://www.acera.unimelb.edu.au/ > 23. http://www.ms.unimelb.edu.au/FAwR/ > 24. http://www.ms.unimelb.edu.au/spuRs/ -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Thu May 5 01:43:32 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 09:43:32 +1000 Subject: [R] Problems saving ff objects In-Reply-To: <968391.72417.qm@web28208.mail.ukl.yahoo.com> References: <86317.99948.qm@web28204.mail.ukl.yahoo.com> <968391.72417.qm@web28208.mail.ukl.yahoo.com> Message-ID: <20110504234332.GD827@ms.unimelb.edu.au> I wonder if this question should be directed to the package maintainer? Best wishes, Andrew On Wed, May 04, 2011 at 02:31:51PM +0100, Jannis wrote: > Just did some more testing.....May the problem be due to the fact that I am using a windows machine? I just ran the same code on a Linux machine and everything worked fine. > > If windows (or the file system of the disk) caused the problem, is there any way to resolve it? I know that using Linux would be a better choice ;-) but unfortunatley this in no option at the moment.... > > > Best > Jannis > > --- Jannis schrieb am Mi, 4.5.2011: > > > Von: Jannis > > Betreff: [R] Problems saving ff objects > > An: r-help at r-project.org > > Datum: Mittwoch, 4. Mai, 2011 13:17 Uhr > > Dear list, > > > > > > I am trying to understand and use the ff package. As I had > > some problems saving some ff objects, and as I did not fully > > manage to understand the whole concept of *.ff, *.ffData and > > *.RData with the help of the documentation, I tried to > > reproduce the examples from the help of ffsave. > > > > When I ran, however : (copied from the help) > > > > message("let's create some ff objects") > > ? n <- 8e3 > > ? a <- ff(sample(n, n, TRUE), vmode="integer", > > length=n, filename="d:/tmp/a.ff") > > ? b <- ff(sample(255, n, TRUE), vmode="ubyte", > > length=n, filename="d:/tmp/b.ff") > > ? x <- ff(sample(255, n, TRUE), vmode="ubyte", > > length=n, filename="d:/tmp/x.ff") > > ? y <- ff(sample(255, n, TRUE), vmode="ubyte", > > length=n, filename="d:/tmp/y.ff") > > ? z <- ff(sample(255, n, TRUE), vmode="ubyte", > > length=n, filename="d:/tmp/z.ff") > > ? df <- ffdf(x=x, y=y, z=z) > > ? rm(x,y,z) > > > > ? message("save all of them") > > ? ffsave.image("d:/tmp/x") > > > > I get: > > > > Error in ffsave(list = ls(envir = .GlobalEnv, all.names = > > TRUE), file = outfile,? : > > ? the previous files do not match the rootpath (case > > sensitive) > > > > > > Whats wrong here? Should this not be working as I did not > > change anything in the code? > > > > > > > > Cheers > > Jannis > > > > > > > sessionInfo() > > R version 2.12.0 (2010-10-15) > > Platform: i386-pc-mingw32/i386 (32-bit) > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252??? > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C? ? ? ? ? ? > > ? ? ? ? ? ? ? > > [5] LC_TIME=English_United States.1252? ? > > > > attached base packages: > > [1] tools? ???stats? > > ???graphics? grDevices utils? > > ???datasets? methods? > > [8] base? ??? > > > > other attached packages: > > [1] ff_2.2-2???bit_1.1-7? rj_0.5.2-1 > > > > loaded via a namespace (and not attached): > > [1] rJava_0.8-8 > > > > > > > > > > ______________________________________________ > > R-help at r-project.org > > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > > reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Thu May 5 01:56:53 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 09:56:53 +1000 Subject: [R] Panels order in lattice graphs In-Reply-To: <4DC175DD.2090806@ipimar.pt> References: <4DC175DD.2090806@ipimar.pt> Message-ID: <20110504235653.GE827@ms.unimelb.edu.au> Hi Cristina, you can probably hack your own solution using the index.cond argument. Cheers Andrew On Wed, May 04, 2011 at 04:50:53PM +0100, Cristina Silva wrote: > Hi all, > > In lattice graphs, panels are drawn from left to right and bottom to > top. The flag "as.table=TRUE" changes to left to right and top to > bottom. Is there any way to change to first top to bottom and then left > to right? didn?t find anything neither in Help pages nor Lattice book. > > Cristina > > -- > ------------------------------------------ > Cristina Silva > INRB/L-IPIMAR > Unidade de Recursos Marinhos e Sustentabilidade > Av. de Bras?lia, 1449-006 Lisboa > Portugal > Tel.: 351 21 3027096 > Fax: 351 21 3015948 > csilva at ipimar.pt > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From dan.abner99 at gmail.com Thu May 5 01:16:32 2011 From: dan.abner99 at gmail.com (Dan Abner) Date: Wed, 4 May 2011 19:16:32 -0400 Subject: [R] Obtaining the name of an object XXXX Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jan.kacaba at gmail.com Thu May 5 01:41:42 2011 From: jan.kacaba at gmail.com (derek) Date: Wed, 4 May 2011 16:41:42 -0700 (PDT) Subject: [R] print elements' placings from vector In-Reply-To: References: <1304539368841-3496645.post@n4.nabble.com> Message-ID: <1304552502472-3497080.post@n4.nabble.com> Yes, thank you. -- View this message in context: http://r.789695.n4.nabble.com/print-elements-placings-from-vector-tp3496645p3497080.html Sent from the R help mailing list archive at Nabble.com. From sunayan at gmail.com Thu May 5 01:08:40 2011 From: sunayan at gmail.com (sunny) Date: Wed, 4 May 2011 16:08:40 -0700 (PDT) Subject: [R] split character vector by multiple keywords simultaneously Message-ID: <1304550520928-3497033.post@n4.nabble.com> Hi. I have a character vector that looks like this: > temp <- c("Company name: The first company General Manager: John Doe I > Managers: John Doe II, John Doe III","Company name: The second company > General Manager: Jane Doe I","Company name: The third company Managers: > Jane Doe II, Jane Doe III") > temp [1] "Company name: The first company General Manager: John Doe I Managers: John Doe II, John Doe III" [2] "Company name: The second company General Manager: Jane Doe I" [3] "Company name: The third company Managers: Jane Doe II, Jane Doe III" I know all the keywords, i.e. "Company name:", "General Manager:", "Managers:" etc. I'm looking for a way to split this character vector into multiple character vectors, with one column for each keyword and the corresponding values for each, i.e. Company name General Manager Managers 1 The first company John Doe I John Doe II, John Doe III 2 The second company Jane Doe I 3 The third company Jane Doe II, Jane Doe III I have tried a lot to find something suitable but haven't so far. Any help will be greatly appreciated. I am running R-2.12.1 on x86_64 linux. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3497033.html Sent from the R help mailing list archive at Nabble.com. From rolf.turner at xtra.co.nz Thu May 5 03:54:54 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Thu, 05 May 2011 13:54:54 +1200 Subject: [R] Obtaining the name of an object XXXX In-Reply-To: References: Message-ID: <4DC2036E.7080707@xtra.co.nz> On 05/05/11 11:16, Dan Abner wrote: > Hello everyone, > > How does one write a function to return the name of an input object (that is > assumed to be a data frame) as a character string? I tired using the get(), > but this does not work as I had hoped. For example: > > myfn<-function(x){ > > output<-data.frame(Attribute="Data Set Name",Value=as.character(get(x))) > > print(output) > > } foo <- function(x) { deparse(substitute(x)) } cheers, Rolf Turner From A.Robinson at ms.unimelb.edu.au Thu May 5 04:08:34 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 12:08:34 +1000 Subject: [R] fGarch In-Reply-To: References: Message-ID: <20110505020834.GA11866@ms.unimelb.edu.au> Hi Paul, I suggest that you should send us commented, minimal, self-contained, reproducible code. That means, in essence, developing the simplest possible representation of your problem. In the process of developing the simplest possible representation, you may learn more about the problem. Maybe even solve it. Even if you don't, then you enable us to make a much better contribution, because we can actually try out our suggestions before sending them. With what you sent here, all we can do is speculate. Cheers, Andrew On Wed, May 04, 2011 at 04:33:12PM -0400, Paul Ossenbruggen wrote: > Hi, > > I am attempting to fit a ARMA/GARCH regression model without success. > > ### ARIMA-GARCH model with regressor ### > > ### Time series data: A multivariate data set. > cov.ts.dq = cov.ts[1:4,"dq1"][!is.na(cov.ts[,"dq1"])] > cov.ts.day = ts.intersect(dq = diff(q.ts), day = lag(q.ts, -1)) > > ### The following R scripts work: > (summary(no.day.fitr <- garchFit(dq ~ arma(0,3) + garch(1,1), data = cov.ts.day))) > (summary(no.day.fitr2 <- garchFit(dq ~ arma(0,3) + garch(1,1), data = cov.ts.day, > include.mean=FALSE))) > > ### ERROR: I add in the regressor "day". > (summary(no.day.fitr3 <- garchFit(dq ~ day + arma(0,3) + garch(1,1), data = cov.ts.day, > include.mean=FALSE))) > ### Error in .garchArgsParser(formula = formula, data = data, trace = FALSE) : > ### object 'formula.mean' not found > > ### ERROR: > day.fitr4 <- garchFit(formula.mean = dq ~ day + arma(0,3),formula.var = ~garch(1,0), data = cov.ts.day,include.mean = FALSE) > ### Error in garchFit(formula.mean = dq ~ day + arma(0, 3), formula.var = ~garch(1, : > ### Multivariate data inputs require lhs for the formula. > ### Note: If I remove "day" I obtain the same error message. > > I would greatly appreciate knowing how to overcome this problem. > > Paul > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Thu May 5 04:12:49 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 12:12:49 +1000 Subject: [R] two-way group mean prediction in survreg with three factors In-Reply-To: References: Message-ID: <20110505021249.GB11866@ms.unimelb.edu.au> I hope not! Facetiousness aside, the model that you have fit contains C, and, indeed, an interaction between A and C. So, the effect of A upon the response variable depends on the level of C. The summary you want must marginalize C somehow, probably by a weighted or unweighted average across its levels. What does that summary really mean? Can you meaningfully average across the levels of a predictor that is included in the model as a main and an interaction term? Best wishes Andrew On Wed, May 04, 2011 at 12:24:50PM -0400, Pang Du wrote: > I'm fitting a regression model for censored data with three categorical > predictors, say A, B, C. My final model based on the survreg function is > > Surv(..) ~ A*(B+C). > > I know the three-way group mean estimates can be computed using the predict > function. But is there any way to obtain two-way group mean estimates, say > estimated group mean for (A1, B1)-group? The sample group means don't > incorporate censoring and thus may not be appropriate here. > > > > Pang Du > > Virginia Tech > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Thu May 5 04:22:16 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 12:22:16 +1000 Subject: [R] split character vector by multiple keywords simultaneously In-Reply-To: <1304550520928-3497033.post@n4.nabble.com> References: <1304550520928-3497033.post@n4.nabble.com> Message-ID: <20110505022216.GC11856@ms.unimelb.edu.au> A hack would be to use gsub() to prepend e.g. XXX to the keywords that you want, perform a strsplit() to break the lines into component strings, and then substr() to extract the pieces that you want from those strings. Cheers Andrew On Wed, May 04, 2011 at 04:08:40PM -0700, sunny wrote: > Hi. I have a character vector that looks like this: > > > temp <- c("Company name: The first company General Manager: John Doe I > > Managers: John Doe II, John Doe III","Company name: The second company > > General Manager: Jane Doe I","Company name: The third company Managers: > > Jane Doe II, Jane Doe III") > > temp > [1] "Company name: The first company General Manager: John Doe I Managers: > John Doe II, John Doe III" > [2] "Company name: The second company General Manager: Jane Doe I" > [3] "Company name: The third company Managers: Jane Doe II, Jane Doe III" > > I know all the keywords, i.e. "Company name:", "General Manager:", > "Managers:" etc. I'm looking for a way to split this character vector into > multiple character vectors, with one column for each keyword and the > corresponding values for each, i.e. > > Company name General Manager Managers > 1 The first company John Doe I John Doe II, John > Doe III > 2 The second company Jane Doe I > 3 The third company Jane Doe II, > Jane Doe III > > I have tried a lot to find something suitable but haven't so far. Any help > will be greatly appreciated. I am running R-2.12.1 on x86_64 linux. > > Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/split-character-vector-by-multiple-keywords-simultaneously-tp3497033p3497033.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Thu May 5 05:25:42 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 13:25:42 +1000 Subject: [R] Uniform Gaussian Kernel In-Reply-To: <1304519420181-3495742.post@n4.nabble.com> References: <1304519420181-3495742.post@n4.nabble.com> Message-ID: <20110505032542.GC11866@ms.unimelb.edu.au> I can't honestly say that I grasp what you're trying to do, but that said, I wonder if the curve() function will help you? Cheers Andrew On Wed, May 04, 2011 at 07:30:20AM -0700, blutack wrote: > I have a vector with lots of different numbers. I need to make a graph > showing the Uniform Distribution of the figures. I have created a graph > showing all the different values, but now want individual Gaussian Kernel > round each point. This is what I have but each time it comes up with an > error as I have just based it on the Normal Distribution, but I'm not sure > what I need to change to make it work. Where z is my vector. > > plot(0, 0, xlim=range(0, 300), ylim=range(0, 1), pch=NA,) > for(i in 1:length(z)) { > points(z[i], 0, pch="|") > } > > x = seq(-10, 10, 0.01) > for(i in 1:length(z)){ > std_dev = 1 > lines(x, dunif(x, z[i], sd = std_dev)) > } > > Any ideas? Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/Uniform-Gaussian-Kernel-tp3495742p3495742.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From petr.pikal at precheza.cz Thu May 5 06:41:41 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Thu, 5 May 2011 06:41:41 +0200 Subject: [R] SAPPLY function XXXX In-Reply-To: <4DC1B693.9020508@ccbr.umn.edu> References: <4DC1B693.9020508@ccbr.umn.edu> Message-ID: Hi r-help-bounces at r-project.org napsal dne 04.05.2011 22:26:59: > Erik Iverson > Odeslal: r-help-bounces at r-project.org > > 04.05.2011 22:26 > > Komu > > Dan Abner > > > > Ultimately, I would like for this to be 1 conponent in a larger function > > that will produce PROC CONTENTS style output. Something like... > > > > data1.contents<-data.frame(Variable=names(data1), > > Class=sapply(data1,class), > > n.valid=sapply(data1,sum(!is.na)), > > n.miss=sapply(data1,sum(is.na))) > > data1.contents > > Also meant to mention to see ?describe in the Hmisc package: > > E.g., > > > describe(c(NA, 1:10)) > > There is also a useful method for data.frame objects. colSums(is.na(data1)) colSums(!is.na(data1)) may also show number of missing and nonmissing values in data frame. Regards Petr > > --Erik > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From A.Robinson at ms.unimelb.edu.au Thu May 5 07:31:04 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 15:31:04 +1000 Subject: [R] bivariate linear interpolation In-Reply-To: References: Message-ID: <20110505053104.GD11866@ms.unimelb.edu.au> The one that gives results that you trust and uses algorithms that you understand! Cheers Andrew On Wed, May 04, 2011 at 12:52:03PM +0000, Halld?r Bj?rnsson wrote: > Hi, > > I have three matrices (X,Y,P) with the same dimension. The X,Y grid is > regular and I want to > perform linear interpolation to pick out certain points. In matlab > appropriate call is > something like > > Pout=interp2(X,Y,P,Xout,Yout, method="linear") > > where Xout and Yout are the locations where I want the Pout data > (typically a different grid). > (Scipy has this routine in interpolate.interp2d, with similar arguments) > > > In R there is (as often) the choice between many different > interpolation routines. Akima has one for irregularly spaced > data (and does not like co-linearity in the data). Fields has another > one, with a more complicated arguments. > > What is the best R function that accomplishes this? > > Sincerely > Halld?r > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From gunter.berton at gene.com Thu May 5 07:55:23 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 4 May 2011 22:55:23 -0700 Subject: [R] best subset regression in R In-Reply-To: <792436.61050.qm@web38308.mail.mud.yahoo.com> References: <792436.61050.qm@web38308.mail.mud.yahoo.com> Message-ID: On Wed, May 4, 2011 at 9:47 AM, FMH wrote: > Dear All, > > Could someone please give some advice the way to do linear modelling via best subset regression in R?... Yes. Don't do it. -- Bert (Very Brief Explanation: Best subset regression was a questionable approach to parsimonious modeling largely dictated by the statistical/computing technology available in the 1960's and 70's. It should by now be abandoned, buried, and forgotten. Use shrinkage instead. LARS/LASSO (in the glmnet package) are among the possibilities. Consult your local statistician for help (after making sure he/she knows about such approaches, as not all do). Frank Harrell's "Regresiion Modeling Strategies" is a useful starting point to learn about this. > > Thanks, > Kagba > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From russ.abbott at gmail.com Thu May 5 08:47:00 2011 From: russ.abbott at gmail.com (Russ Abbott) Date: Wed, 4 May 2011 23:47:00 -0700 Subject: [R] quantmod's addTA plotting functions Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ing.mgpp at gmail.com Thu May 5 07:31:36 2011 From: ing.mgpp at gmail.com (Maximo Polanco) Date: Thu, 5 May 2011 01:31:36 -0400 Subject: [R] Question about error of "non-numeric argument to binary operator" Message-ID: <000001cc0ae5$b4a82a10$1df87e30$@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vkedzior at is.pw.edu.pl Thu May 5 06:21:36 2011 From: vkedzior at is.pw.edu.pl (mateokkk) Date: Wed, 4 May 2011 21:21:36 -0700 (PDT) Subject: [R] R - problem with my loops which operate on raster data Message-ID: <1304569296827-3497533.post@n4.nabble.com> library(rgdal) my_asc=dir("~/Pulpit/dods/karol/TVDI 113_121",pattern=".asc",recursive=T,full.names=T) for (i in 1:length(my_asc)) { r <- readGDAL(my_asc[i]) z <- as.matrix(r) vectordata[i] <- mean(z) vectordatamax[i] <- max(z) vectordatamin[i] <- min(z) vectordev[i] <- sd(z,na.rm=True) hist(z) png(filename="hist"+tostring(i)+".png") }
  • I try to do some modyfication of this loop, but it still doesn't works - which fragment is incorrect?
  • I would like also to use more complicated pattern (to list only files which contains on the end of it name two numbers), but adding something like: pattern="_??.asc" seems not works.
  • I would like to add one more loop to get list of folders (instead of manually inserting directories into my_asc variable), but I haven't got Idea how I can do it?
  • I do not know, why my way of creating vectors for mean, max, min and standard deviation values is not working... -- View this message in context: http://r.789695.n4.nabble.com/R-problem-with-my-loops-which-operate-on-raster-data-tp3497533p3497533.html Sent from the R help mailing list archive at Nabble.com. From l.cattarino at uq.edu.au Thu May 5 07:38:02 2011 From: l.cattarino at uq.edu.au (Lorenzo Cattarino) Date: Thu, 5 May 2011 15:38:02 +1000 Subject: [R] lapply, if statement and concatenating to a list Message-ID: <2869E75AAA158C4E936333A119467ADD1F7DB730BF@UQEXMB06.soe.uq.edu.au> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From slamdunkangel666 at gmail.com Thu May 5 08:10:07 2011 From: slamdunkangel666 at gmail.com (nisha chandran) Date: Thu, 5 May 2011 11:40:07 +0530 Subject: [R] Extraction of columns from two matrices Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mark_difford at yahoo.co.uk Thu May 5 08:59:01 2011 From: mark_difford at yahoo.co.uk (Mark Difford) Date: Wed, 4 May 2011 23:59:01 -0700 (PDT) Subject: [R] Panels order in lattice graphs In-Reply-To: <4DC175DD.2090806@ipimar.pt> References: <4DC175DD.2090806@ipimar.pt> Message-ID: <1304578741678-3497691.post@n4.nabble.com> May 04, 2011; 5:50pm Cristina Silva wrote: > In lattice graphs, panels are drawn from left to right and bottom to > top. The flag "as.table=TRUE" changes to left to right and top to > bottom. Is there any way to change to first top to bottom and then left > to right? didn?t find anything neither in Help pages nor Lattice book. Cristina, You have not fully explained your problem. An approach I use for difficult-to-get-right arrangements is the following. Say you have two conditioning variables (13 panels in all) and you want to place the last panel on the top row in the first position on the bottom row, but leave everything else the same, then easiest is the following: ## Note: T.xyplot$index.cond is a __list__, so you need to use [[ T.xyplot <- xyplot(Prop ~ OM | interaction(Treatment, Aspect, drop = TRUE), data = myDat) print(T.xyplot) > T.xyplot$index.cond [[1]] [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 T.xyplot$index.cond[[1]] <- c(13, 1:12) print(T.xyplot) Hope this helps to solve your problem. Regards, Mark. ----- Mark Difford (Ph.D.) Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa -- View this message in context: http://r.789695.n4.nabble.com/Panels-order-in-lattice-graphs-tp3496022p3497691.html Sent from the R help mailing list archive at Nabble.com. From e.hofstadler at gmail.com Thu May 5 09:08:51 2011 From: e.hofstadler at gmail.com (E Hofstadler) Date: Thu, 5 May 2011 10:08:51 +0300 Subject: [R] memory and bootstrapping Message-ID: hello, the following questions will without doubt reveal some fundamental ignorance, but hopefully you can still help me out. I'd like to bootstrap a coefficient gained on the basis of the coefficients in a logistic regression model (the mean differences in the predicted probabilities between two groups, where each predict() operation uses as the newdata-argument a dataframe of equal size as the original dataframe).I've got 130,000 rows and 7 columns in my dataframe. The glm-model uses all variables (as well as two 2-way interactions). System: - R-version: 2.12.2 - OS: Windows XP Pro, 32-bit - 3.16Ghz intel dual core processor, 2.9GB RAM I'm using the boot package to arrive at the standard errors for this difference, but even with only 10 replications, this takes quite a long time: 216 seconds (perhaps this is partly also due to my inefficiently programmed function underlying the boot-call, I'm also looking into that). I wanted to try out calculating a bca-bootstrapped confidence interval, which as I understand requires a lot more replications than normal-theory intervals. Drawing on John Fox' Appendix to his "An R Companion to Applied Regression", I was thinking of trying out 2000 replications -- but this will take several hours to compute on my system (which isn't in itself a major issue though). My Questions: - let's say I try bootstrapping with 2000 replications. Can I be certain that the memory available to R will be sufficient for this operation? - (this relates to statistics more generally): is it a good idea in your opinion to try bca-bootstrapping, or can it be assumed that a normal theory confidence interval will be a sufficiently good approximation (letting me get away with, say, 500 replications)? Best, Esther From Alfredo.Roccato at unicredit.eu Thu May 5 09:19:06 2011 From: Alfredo.Roccato at unicredit.eu (Roccato Alfredo (UniCredit)) Date: Thu, 5 May 2011 09:19:06 +0200 Subject: [R] R: join tables in R In-Reply-To: <20110504201335.GA22684@praha1.ff.cuni.cz> Message-ID: <05A5593D7A426542B6F9D8C3B538CF9C156EB6B5A2@USEXCPWM07.mc01.unicreditgroup.eu> Thanks a lot to Steve Lianoglou and Peter Savicky for their help! Alfredo -----Messaggio originale----- Da: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com] > I'd to match-merge 2 tables in such a manner that I keep all the rows in table 1, but not the rows that are in both table 1 and 2. >> master <- data.frame(ID=2001:2011) >> train <- data.frame(ID=2004:2006) >> valid <- ??? > in this example table valid should have the following >> str(valid) > Year: int 2001 2002 2003 2007 2008 2009 2010 2011 Are you working with only one column at a time? If so: keep <- !(master$ID %in% train$ID) valid <- master[keep,] -----Messaggio originale----- Da: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Per conto di Petr Savicky Try the following, which assumes that "train" is a subset of "master". master <- data.frame(ID=2001:2011) train <- data.frame(ID=2004:2006) valid <- master[! (master[, 1] %in% train[ ,1]), , drop=FALSE] From lebatsnok at gmail.com Thu May 5 09:49:21 2011 From: lebatsnok at gmail.com (Kenn Konstabel) Date: Thu, 5 May 2011 10:49:21 +0300 Subject: [R] lapply, if statement and concatenating to a list In-Reply-To: <2869E75AAA158C4E936333A119467ADD1F7DB730BF@UQEXMB06.soe.uq.edu.au> References: <2869E75AAA158C4E936333A119467ADD1F7DB730BF@UQEXMB06.soe.uq.edu.au> Message-ID: Hi Lorenzo, On Thu, May 5, 2011 at 8:38 AM, Lorenzo Cattarino wrote: > Hi R users > > I was wondering on how to use lapply & co when the applied function has a conditional statement and the output is a 'growing' object. > See example below: > > list1 <- list('A','B','C') > list2 <- c() > > myfun <- function(x,list2) > { > ?one_elem <- x > ?cat('one_elem= ', one_elem, '\n') > ?random <- sample(1:2,1) > ?show(random) > ?if(random==2) > ?{ > ? ?list2 <- c(list2,one_elem) > ?}else{ > ? ?list2 > ?} > } > > lapply(list1,myfun,list2) > > Is there a way to get rid of the 'NULL' elements in the output (when there is any?), without using a for loop? I don't understand what your example is trying to do and which object you expect to be "growing". list2 ain't growin', and it's not changing (i.e., it remains NULL) in your code. Perhaps you meant to have a <<- there; this would make your list2 "growing", if you really want it to, but in general, that's a bad idea. Lapply goes best with the functional style where everything your function does is computing and returning a value but here you're (if I get your intentions correctly) counting on side effects. If you like side effects, a for (or while) loop may be more logical choice. Getting rid of the NULL elements is simple. One way is: foo <- lapply(list1, yourfun) foo[!sapply(foo, is.null)] Regards, Kenn > ? ? ? ?[[alternative HTML version deleted]] From lucia.canas at co.ieo.es Thu May 5 09:59:26 2011 From: lucia.canas at co.ieo.es (=?iso-8859-1?Q?Lucia_Ca=F1as?=) Date: Thu, 5 May 2011 09:59:26 +0200 Subject: [R] combine lattice plot and standard R plot Message-ID: <50EB6473669C6741AC34FF5692DCC4564BED44@ieocoruna.co.ieo.es> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ripley at stats.ox.ac.uk Thu May 5 10:01:46 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Thu, 5 May 2011 09:01:46 +0100 (BST) Subject: [R] memory and bootstrapping In-Reply-To: References: Message-ID: The only reason the boot package will take more memory for 2000 replications than 10 is that it needs to store the results. That is not to say that on a 32-bit OS the fragmentation will not get worse, but that is unlikely to be a significant factor. As for the methodology: 'boot' is support software for a book, so please consult it (and not secondary sources). From your brief description it looks to me as if you should be using studentized CIs. 130,000 cases is a lot, and running the experiment on a 1% sample may well show that asymptotic CIs are good enough. On Thu, 5 May 2011, E Hofstadler wrote: > hello, > > the following questions will without doubt reveal some fundamental > ignorance, but hopefully you can still help me out. > > I'd like to bootstrap a coefficient gained on the basis of the > coefficients in a logistic regression model (the mean differences in > the predicted probabilities between two groups, where each predict() > operation uses as the newdata-argument a dataframe of equal size as > the original dataframe).I've got 130,000 rows and 7 columns in my > dataframe. The glm-model uses all variables (as well as two 2-way > interactions). > > System: > - R-version: 2.12.2 > - OS: Windows XP Pro, 32-bit > - 3.16Ghz intel dual core processor, 2.9GB RAM > > I'm using the boot package to arrive at the standard errors for this > difference, but even with only 10 replications, this takes quite a > long time: 216 seconds (perhaps this is partly also due to my > inefficiently programmed function underlying the boot-call, I'm also > looking into that). > > I wanted to try out calculating a bca-bootstrapped confidence > interval, which as I understand requires a lot more replications than > normal-theory intervals. Drawing on John Fox' Appendix to his "An R > Companion to Applied Regression", I was thinking of trying out 2000 > replications -- but this will take several hours to compute on my > system (which isn't in itself a major issue though). > > My Questions: > - let's say I try bootstrapping with 2000 replications. Can I be > certain that the memory available to R will be sufficient for this > operation? > - (this relates to statistics more generally): is it a good idea in > your opinion to try bca-bootstrapping, or can it be assumed that a > normal theory confidence interval will be a sufficiently good > approximation (letting me get away with, say, 500 replications)? > > > Best, > Esther > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From ivan.calandra at uni-hamburg.de Thu May 5 10:27:47 2011 From: ivan.calandra at uni-hamburg.de (Ivan Calandra) Date: Thu, 05 May 2011 10:27:47 +0200 Subject: [R] Str info. Thanks for helping In-Reply-To: References: <5188382.61183.1304533027791.JavaMail.nabble@joe.nabble.com> Message-ID: <4DC25F83.7080605@uni-hamburg.de> Hi William! So... many things to say First, your first steps are completely unnecessary. Just do this: data1 <- read.csv(file.choose(), header=TRUE) data1$IND <- factor(data1$IND) M <- manova(as.matrix(data1[,-1])~data1$IND, data=data1) Though I don't understand why this doesn't work: M <- manova(.~IND, data=data1) Error in eval(expr, envir, enclos) : object '.' not found Second, this doesn't work either: S<-summary(M, test="Pillai") Error in summary.manova(M, test = "Pillai") : residuals have rank 11 < 12 And I don't know what that means since I never do manova, but it's not really where you're stuck, I guess Third, where you're stuck. Is this what you want? SA<-summary.aov(M) out <- lapply(seq_along(SA), FUN=function(x) unlist(SA[[x]][1,])) mat <- do.call("rbind", out) row.names(mat) <- names(SA) Fourth, I wouldn't use as.vector(). I'm no R expert, but I have the impression that it is useless in this case. It probably has its use in some situations, but I think as.matrix(), as.data.frame() and so on would be more useful to you. And last, reply to the list as well!! Since attached files are not transferred, copy the output from dput(data1) into the email, like this: data1 <- structure(list(IND = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, ... "Dep1", "Dep2", "Dep3", "Dep4", "Dep5", "Dep6", "Dep7", "Dep8", "Dep9", "Dep10", "Dep11", "Dep12"), row.names = c(NA, -133L), class = "data.frame") ## copy the whole thing, right!! HTH, Ivan Le 5/5/2011 02:41, Reith, William [USA] a ?crit : > CSV file is attached. > > I am doing a manova test. Followed by a summary.aov to determine which of the 12 dependent variables are significant and warrant further testing. There is only one independent variable which is categorical with 4 factors. > > R Code: > > data1<- read.csv(file link, header=TRUE) > as.matrix(data1) > X<-data1[,1] > Dep1<-data1[,2] > Dep2<-data1[,3] > Dep3<-data1[,4] > Dep4<-data1[,5] > Dep5<-data1[,6] > Dep6<-data1[,7] > Dep7<-data1[,8] > Dep8<-data1[,9] > Dep9<-data1[,10] > Dep10<-data1[,11] > Dep11<-data1[,12] > Dep12<-data1[,13] > Y<-cbind(Dep1, Dep2, Dep3, Dep4, Dep5, Dep6, Dep7, Dep8, Dep9, Dep10, Dep11, Dep12) > M<- manova(Y ~ as.factor(X), data=data1) > > S<-summary(M, test="Pillai") > S1<-as.vector(S$stats[1,]) > L<-mat.or.vec(12,3) > SA<-summary.aov(M) > > #Stops working here. I want to save the numbers from each Dep"i" test as a matrix or vector. S1 above works for the manova test, but I don't know how to reference my values for summary.aov > > SA1<-SA[" Response Dep1"] > > > Thank you so much for any help you can give, > > William > > > > -----Original Message----- > From: Ivan Calandra [mailto:ivan.calandra at uni-hamburg.de] > Sent: Wednesday, May 04, 2011 4:38 PM > To: Reith, William [USA] > Cc: r-help at r-project.org > Subject: Re: Str info. Thanks for helping > > It looks from str(SA) that Response IPS1 is a data.frame of class "anova", which probably cannot be coerced to vector. > > Maybe you can use unlist() instead of as.vector() > Or something like > SA[["Response IPS1"]]["as.factor(WSD)",] ## to select the first row only, even maybe with unlist() > > Without a better REPRODUCIBLE example, I cannot tell more (maybe some others can, that's why I reply to the list) > > HTH, > Ivan > > > Le 4 mai 2011 ? 20:17, reith_william at bah.com a ?crit : > >> I am still waiting for this to get posted so I thought I would email it to you. >> >> SA gives the output: >> >> Response IPS1 : >> Df Sum Sq Mean Sq F value Pr(>F) >> as.factor(WSD) 3 3.3136 1.10455 23.047 5.19e-12 *** >> Residuals 129 6.1823 0.04793 >> --- >> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >> . >> . >> . >> There are 11 more just like this output. Just increment IPS1 to IPS2, etc. >> >> >> Goal: save "3 3.3136 1.10455 23.047 5.19e-12" as a vector. >> >> >> Str(SA) gives the output: >> >> str(SA) >>> str(SA) >> List of 12 >> " $ Response IPS1 :Classes 'anova' and 'data.frame': 2 obs. of 5 variables:" >> ..$ Df : num [1:2] 3 129 >> ..$ Sum Sq : num [1:2] 3.31 6.18 >> ..$ Mean Sq: num [1:2] 1.1045 0.0479 >> ..$ F value: num [1:2] 23 NA >> ..$ Pr(>F) : num [1:2] 5.19e-12 NA >> >> >> There are several more but they are just repeats of this one only with IPS2, IPS3,... >> >> The command: >> >>> SA1<-as.vector(SA$"Reponse IPS1") >> Returns >> >>> NULL >> As do several variations I have tried. Any ideas. > > > -- > Ivan CALANDRA > PhD Student > University of Hamburg > Biozentrum Grindel und Zoologisches Institut und Museum > Abt. S?ugetiere > Martin-Luther-King-Platz 3 > D-20146 Hamburg, GERMANY > +49(0)40 42838 6231 > ivan.calandra at uni-hamburg.de > > ********** > http://www.for771.uni-bonn.de > http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php > -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php From dan.abner99 at gmail.com Thu May 5 09:28:43 2011 From: dan.abner99 at gmail.com (Dan Abner) Date: Thu, 5 May 2011 03:28:43 -0400 Subject: [R] Options for print() Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mark_difford at yahoo.co.uk Thu May 5 09:18:35 2011 From: mark_difford at yahoo.co.uk (Mark Difford) Date: Thu, 5 May 2011 00:18:35 -0700 (PDT) Subject: [R] combine lattice plot and standard R plot In-Reply-To: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> References: <50EB6473669C6741AC34FF5692DCC4564BED14@ieocoruna.co.ieo.es> Message-ID: <1304579915847-3497717.post@n4.nabble.com> On May 04, 2011 at 8:26pm Lucia Ca?as wrote: > I would like to combine lattice plot (xyplot) and standard R plot (plot > and plotCI) in an unique figure. Hi Lucia, Combining the two systems can be done. See: Paul Murrell. Integrating grid graphics output with base graphics output. R News, 3(2):7-12, October 2003 http://cran.r-project.org/doc/Rnews/Rnews_2003-2.pdf Hope this helps. Regards, Mark. ----- Mark Difford (Ph.D.) Research Associate Botany Department Nelson Mandela Metropolitan University Port Elizabeth, South Africa -- View this message in context: http://r.789695.n4.nabble.com/combine-lattice-plot-and-standard-R-plot-tp3496409p3497717.html Sent from the R help mailing list archive at Nabble.com. From baptiste.auguie at googlemail.com Thu May 5 10:45:49 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Thu, 5 May 2011 20:45:49 +1200 Subject: [R] Options for print() In-Reply-To: References: Message-ID: Hi, Try this, cat(format("The TITLE", width=80, justify="centre")) HTH, baptiste On 5 May 2011 19:28, Dan Abner wrote: > Hello everyone, > > I have a few questions about the print() fn: > > 1) I have the following code that does not center the character string: > > print("The TITLE",quote=FALSE,justify="center") > > 2) How can I get R to not print the leading [1], etc. ?when using print()? > > (Sorry, I don't know what the leading [1] is called. I tried looking it up > in "An Introduction", but could not find it). > > Any help is greatly appreciated! > > Dan > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From joda2457 at student.uu.se Thu May 5 10:50:45 2011 From: joda2457 at student.uu.se (Joel) Date: Thu, 5 May 2011 01:50:45 -0700 (PDT) Subject: [R] Remove all whitespaces Message-ID: <1304585445836-3497867.post@n4.nabble.com> Hi I got a string that looks something like this 1 2 3 4 5 6 7 8 9 ... and I want it to be 123456789... So I want to remove all spaces (or whitespaces) from my string. Anyone know a good way of doing this? //Joel -- View this message in context: http://r.789695.n4.nabble.com/Remove-all-whitespaces-tp3497867p3497867.html Sent from the R help mailing list archive at Nabble.com. From mdsumner at gmail.com Thu May 5 10:51:43 2011 From: mdsumner at gmail.com (Michael Sumner) Date: Thu, 5 May 2011 18:51:43 +1000 Subject: [R] R - problem with my loops which operate on raster data In-Reply-To: <1304569296827-3497533.post@n4.nabble.com> References: <1304569296827-3497533.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jeanpaul.ebejer at inhibox.com Thu May 5 10:58:59 2011 From: jeanpaul.ebejer at inhibox.com (JP) Date: Thu, 5 May 2011 09:58:59 +0100 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: <8620F7B8-6856-4E48-A8C8-5D95D210AFDF@gmail.com> References: <8620F7B8-6856-4E48-A8C8-5D95D210AFDF@gmail.com> Message-ID: On 4 May 2011 15:32, peter dalgaard wrote: > > On May 4, 2011, at 15:11 , JP wrote: > >> Peter thanks for the fantastically simple and understandable explanation... >> >> To sum it up... to find the z values of a number of pairwise wilcox >> tests do the following: >> >> # pairwise tests with bonferroni correction >> x <- pairwise.wilcox.test(a, b, alternative="two.sided", >> p.adj="bonferroni", exact=F, paired=T) > > > You probably don't want the bonferroni correction there. Rather p.adj="none". You generally correct the p values for multiple testing, not the test statistics. > Oh, I see thanks... of course since I have 5 groups (samples) and 10 comparisons I still have to correct when quoting p values... > (My sentiment would be to pick apart the stats:::wilcox.test.default function and clone the computation of Z from it, but presumably backtracking from the p value is a useful expedient.) > Should this be so onerous for the user [read non-statistician] ? >> # what is the data structure we got back >> is.matrix(x$p.value) >> # p vals >> x$p.value >> # z.scores for each >> z.score <- qnorm(x$p.value / 2) >> > > Hmm, you're not actually getting a signed z out of this, you might want to try alternative="greater" and drop the division by 2 inside qnorm(). (If the signs come out inverted, I meant "less" not "greater"...) > But I need a two sided test (changing the alternative would change the hypothesis!)... do I still do this? All my z values are negative.... Is this correct? From ivan.calandra at uni-hamburg.de Thu May 5 11:02:59 2011 From: ivan.calandra at uni-hamburg.de (Ivan Calandra) Date: Thu, 05 May 2011 11:02:59 +0200 Subject: [R] Remove all whitespaces In-Reply-To: <1304585445836-3497867.post@n4.nabble.com> References: <1304585445836-3497867.post@n4.nabble.com> Message-ID: <4DC267C3.6050607@uni-hamburg.de> Hi Joel, Try this: x <- "1 2 3 4 5 6 7 8 9 " gsub(" ", "", x) Ivan Le 5/5/2011 10:50, Joel a ?crit : > Hi > > I got a string that looks something like this > > 1 2 3 4 5 6 7 8 9 ... > > and I want it to be > > 123456789... > > So I want to remove all spaces (or whitespaces) from my string. > > Anyone know a good way of doing this? > > //Joel > -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. S?ugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php From andreas.borg at unimedizin-mainz.de Thu May 5 11:17:04 2011 From: andreas.borg at unimedizin-mainz.de (Andreas Borg) Date: Thu, 05 May 2011 11:17:04 +0200 Subject: [R] Options for print() In-Reply-To: References: Message-ID: <4DC26B10.9050309@unimedizin-mainz.de> Also take a look at sprintf, which offers everything the C-equivalent has. sprintf returns a string which can be sent to the console via cat. Andreas baptiste auguie schrieb: > Hi, > > Try this, > > cat(format("The TITLE", width=80, justify="centre")) > > HTH, > > baptiste > > On 5 May 2011 19:28, Dan Abner wrote: > >> Hello everyone, >> >> I have a few questions about the print() fn: >> >> 1) I have the following code that does not center the character string: >> >> print("The TITLE",quote=FALSE,justify="center") >> >> 2) How can I get R to not print the leading [1], etc. when using print()? >> >> (Sorry, I don't know what the leading [1] is called. I tried looking it up >> in "An Introduction", but could not find it). >> >> Any help is greatly appreciated! >> >> Dan >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Andreas Borg Medizinische Informatik UNIVERSIT?TSMEDIZIN der Johannes Gutenberg-Universit?t Institut f?r Medizinische Biometrie, Epidemiologie und Informatik Obere Zahlbacher Stra?e 69, 55131 Mainz www.imbei.uni-mainz.de Telefon +49 (0) 6131 175062 E-Mail: borg at imbei.uni-mainz.de Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und l?schen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail und der darin enthaltenen Informationen ist nicht gestattet. From pschmitz at illinois.edu Thu May 5 11:20:26 2011 From: pschmitz at illinois.edu (Pat Schmitz) Date: Thu, 5 May 2011 11:20:26 +0200 Subject: [R] Ruby Koans an amazing platform for teaching programming. Would this work with R? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sterlesser at hotmail.com Thu May 5 10:20:33 2011 From: sterlesser at hotmail.com (sterlesser) Date: Thu, 5 May 2011 01:20:33 -0700 (PDT) Subject: [R] nls problem with R In-Reply-To: References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> <1304518064519-3495672.post@n4.nabble.com> Message-ID: <1304583633413-3497825.post@n4.nabble.com> ID1 ID2 t V(t) 1 1 0 6.053078443 2 1 0.3403 5.56937391 3 1 0.4181 5.45484486 4 1 0.4986 5.193124598 5 1 0.7451 4.31386722 6 1 1.0069 3.645422269 7 1 1.5535 3.587710965 8 1 1.8049 3.740362689 9 1 2.4979 3.699837726 10 1 6.4903 2.908485019 11 1 13.5049 1.888179494 12 1 27.5049 1.176091259 13 1 41.5049 1.176091259 The model (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))] (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]} in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5} lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5} A=(epi*c-lambda2)/(lambda1-lambda2) The regression rule : for formula (1):(t<=2,that is) first 8 rows are used for non-linear regression epi,c,t0,V0 parameters are obtained for formula (2):all 13 rows of results are used for non-linear regression lambda1,lambda2,A (with these parameters, delta can be calculated from them) Thanks for help Ster Lesser -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html Sent from the R help mailing list archive at Nabble.com. From sterlesser at hotmail.com Thu May 5 10:22:36 2011 From: sterlesser at hotmail.com (sterlesser) Date: Thu, 5 May 2011 01:22:36 -0700 (PDT) Subject: [R] nls problem with R In-Reply-To: <1304583633413-3497825.post@n4.nabble.com> References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> <1304518064519-3495672.post@n4.nabble.com> <1304583633413-3497825.post@n4.nabble.com> Message-ID: <1304583756539-3497827.post@n4.nabble.com> the dataset's form is changed after my post so I repost it here t 0 0.3403 0.4181 0.4986 0.7451 1.0069 1.5535 1.8049 2.4979 6.4903 13.5049 27.5049 41.5049 V(t) 6.053078443 5.56937391 5.45484486 5.193124598 4.31386722 3.645422269 3.587710965 3.740362689 3.699837726 2.908485019 1.888179494 1.176091259 1.176091259 -- View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497827.html Sent from the R help mailing list archive at Nabble.com. From tom.osborn at iinet.net.au Thu May 5 11:00:00 2011 From: tom.osborn at iinet.net.au (Tom Osborn) Date: Thu, 5 May 2011 19:00:00 +1000 Subject: [R] Remove all whitespaces References: <1304585445836-3497867.post@n4.nabble.com> Message-ID: <20A67FE6E4244F528DEE1944C8DA0984@Entropic> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From r.t.a.j.leenders at rug.nl Thu May 5 11:33:45 2011 From: r.t.a.j.leenders at rug.nl (R.T.A.J.Leenders) Date: Thu, 05 May 2011 11:33:45 +0200 Subject: [R] issue with "strange" characters (readHTMLTable) Message-ID: <7600dff660597.4dc28b19@rug.nl> Thank you. The line of code you give certainly resolves several of the issues. I didn't realize that font support is such a tough matter to realize. Let me express my gratitude to those who provide this for us in R. On 04-05-11, Prof Brian Ripley wrote: Oh, please! This is about the contributed package XML, not R and not Windows. Some of us have worked very hard to provide reasonable font support in R, including on Windows. We are given exceedingly little credit, just the brickbats for things for which we are not responsible. (We even work hard to port XML to Windows for you, again with almost zero credit.) That URL is a page in UTF-8, as its header says. We have provided many ways to work with UTF-8 on Windows, but it seems readHTMLTable() is not making use of them. You need to run iconv() on the strings in your object (which as it has factors, are the levels). When you do so, you will discover that page contains characters not in your native charset (I presume, not having your locale). What you can do, in Rgui only, is for (n in names(Islands)) Encoding(levels(Islands[[n]])) <-"UTF-8" but likely there are still characters it will not know how to display. On Wed, 4 May 2011, R.T.A.J.Leenders wrote: > > WinXP-x32, R-21.13.0 > Dear list, > I have a problem that (I think) relates to the interaction between Windows > and R. > I am trying to scrape a table with data on the Hawai'ian Islands, This is my > code: > library(XML) > u <- "[1]http://en.wikipedia.org/wiki/Hawaii" > tables <- readHTMLTable(u) > Islands <- tables[[5]] > The output is (first set of columns): > Island Nickname > > Islands > Island Nickname > Location >1 Hawai???????i[7] The Big Island 19???????34????????????N 155???????30????????????W???????????? / ????????????19.567 >???????N 155.5???????W???????????? / 19.567; -155.5 >2 Maui[8] The Valley Isle 20???????48????????????N 156???????20????????????W???????????? / ????????????20.8???????N >156.333???????W???????????? / 20.8; -156.333 >3 Kaho???????olawe[9] The Target Isle 20???????33????????????N 156???????36????????????W???????????? / ????????????20.55 >???????N 156.6???????W???????????? / 20.55; -156.6 >4 L???na???????i[10] The Pineapple Isle 20???????50????????????N 156???????56????????????W???????????? / ????????????20.833???????N 15 >6.933???????W???????????? / 20.833; -156.933 >5 Moloka???????i[11] The Friendly Isle 21???????08????????????N 157???????02????????????W???????????? / ????????????21.133???????N 1 >57.033???????W???????????? / 21.133; -157.033 >6 O???????ahu[12] The Gathering Place 21???????28????????????N 157???????59????????????W???????????? / ????????????21.467???????N 1 >57.983???????W???????????? / 21.467; -157.983 >7 Kaua???????i[13] The Garden Isle 22???????05????????????N 159???????30????????????W???????????? / ????????????22.083 >???????N 159.5???????W???????????? / 22.083; -159.5 >8 Ni???????ihau[14] The Forbidden Isle 21???????54????????????N 160???????10????????????W???????????? / ????????????21.9???????N >160.167???????W???????????? / 21.9; -160.167 > > As you can see, there are "weird" characters in there. I have also tried > readHTMLTable(u, encoding = "UTF-16") and readHTMLTable(u, encoding = > "UTF-8") > but that didn't help. > It seems to me that there may be an issue with the interaction of the > Windows settings of the character set. > sessionInfo() gives > > sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: i386-pc-mingw32/i386 (32-bit) > locale: > [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 > LC_MONETARY=Dutch_Netherlands.1252 > [4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252 > attached base packages: > [1] stats graphics grDevices utils datasets methods base > other attached packages: > [1] XML_3.2-0.2 > > > I have also attempted to let R use another setting by entering: > Sys.setlocale("LC_ALL", "en_US.UTF-8"), but this yields the response: > > Sys.setlocale("LC_ALL", "en_US.UTF-8") > [1] "" > Warning message: > In Sys.setlocale("LC_ALL", "en_US.UTF-8") : > OS reports request to set locale to "en_US.UTF-8" cannot be honored > > > In addition, I have attempted to make the change directly from the windows > command prompt, using: "chcp 65001" and variations of that, but that didn't > change anything. > I have searched the list and the web and have found others bringing forth a > similar issues, but have not been able to find a solution. I looks like this > is an issue of how Windows and R interact. Unfortunately, all three > computers at my disposal have this problem. It occurs both under WinXP-x32 > and under Win7-x86. > Is there a way to make R override the windows settings or can the issue be > solved otherwise? > I have also tried other websites, and the issue occurs every time when there > is an ????, ????, ????, ????, et cetera in the text-to-be-scraped. > Thank you, > Roger >______________________________________________ >R-help at r-project.org mailing list >[2]https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide [3]http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, [4]http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 References 1. http://en.wikipedia.org/wiki/Hawaii 2. https://stat.ethz.ch/mailman/listinfo/r-help 3. http://www.R-project.org/posting-guide.html 4. http://www.stats.ox.ac.uk/%7Eripley/ From djmuser at gmail.com Thu May 5 11:37:35 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Thu, 5 May 2011 02:37:35 -0700 Subject: [R] Extraction of columns from two matrices In-Reply-To: References: Message-ID: Hi: Is this what you have in mind? m1 <- matrix(rpois(100, 10), nrow = 10, dimnames = list(NULL, paste('V', 1:10, sep = ''))) m2 <- matrix(rpois(40, 10), nrow = 10, dimnames = list(NULL, paste('V', c(2, 5, 7, 10), sep = ''))) colnames(m1) colnames(m2) m1[, colnames(m2)] HTH, Dennis On Wed, May 4, 2011 at 11:10 PM, nisha chandran wrote: > Hi, > > I have two matrices with 4885 cols and 36 cols respectively . I would like > to extract these 36 columns from the bigger matrix, the column values are > different but the column names would be the same.Hence based on the names I > would like to perform my operation. So is there any way of extracting this > info in an easy way. I have tried names,which,grep,subset none have worked. > Could someone help me out here > > > > Thanks a ton > Nisha > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From deepayan.sarkar at gmail.com Thu May 5 11:44:14 2011 From: deepayan.sarkar at gmail.com (Deepayan Sarkar) Date: Thu, 5 May 2011 15:14:14 +0530 Subject: [R] Panels order in lattice graphs In-Reply-To: <4DC175DD.2090806@ipimar.pt> References: <4DC175DD.2090806@ipimar.pt> Message-ID: On Wed, May 4, 2011 at 9:20 PM, Cristina Silva wrote: > Hi all, > > In lattice graphs, panels are drawn from left to right and bottom to top. > The flag "as.table=TRUE" changes to left to right and top to bottom. Is > there any way to change to first top to bottom and then left to right? > didn?t find anything neither in Help pages nor Lattice book. See ?packet.panel.default. For example, p <- xyplot(mpg ~ disp | factor(carb), mtcars, as.table = TRUE) print(p, packet.panel = packet.panel.default) my.packet.panel <- function(layout, condlevels, page, row, column, ...) { tlayout <- layout[c(2, 1, 3)] # switch row and column print(packet.panel.default(tlayout, condlevels, page = page, row = column, column = row, ...)) } print(p, packet.panel = my.packet.panel) -Deepayan From janko.thyson.rstuff at googlemail.com Thu May 5 11:50:48 2011 From: janko.thyson.rstuff at googlemail.com (Janko Thyson) Date: Thu, 05 May 2011 11:50:48 +0200 Subject: [R] Remove all whitespaces In-Reply-To: <4DC267C3.6050607@uni-hamburg.de> References: <1304585445836-3497867.post@n4.nabble.com> <4DC267C3.6050607@uni-hamburg.de> Message-ID: <4DC272F8.6020807@googlemail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From slamdunkangel666 at gmail.com Thu May 5 12:02:43 2011 From: slamdunkangel666 at gmail.com (nisha chandran) Date: Thu, 5 May 2011 15:32:43 +0530 Subject: [R] Extraction of columns from two matrices In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djsloan at liv.ac.uk Thu May 5 12:16:46 2011 From: djsloan at liv.ac.uk (dereksloan) Date: Thu, 5 May 2011 03:16:46 -0700 (PDT) Subject: [R] Using functions/loops for repetitive commands Message-ID: <1304590606704-3498006.post@n4.nabble.com> I still need to do some repetitive statistical analysis on some outcomes from a dataset. Take the following as an example; id sex hiv age famsize bmi resprate 1 M Pos 23 2 16 15 2 F Neg 24 5 18 14 3 F Pos 56 14 23 24 4 F Pos 67 3 33 31 5 M Neg 34 2 21 23 I want to know if there are statistically detectable differences in all of the continuous variables in my data set when subdivided by sex or hiv status (ie are age, family size, bmi and resprate different in my male and female patients or in hiv pos/neg patients) Of course I can use wilcoxon or t-tests e.g: wilcox.test( age~sex) wilcox.test(famsize~sex) wilcox.test(bmi~sex) wilcox.test(resprate~sex) wilcox.test( age~hiv) wilcox.test(famsize~hiv) wilcox.test(bmi~hiv) wilcox.test(resprate~hiv) but there must be some easy way of looping/automating this code (i.e. get all the continuous variables analysed one by one by sex, then analysed one by one by hiv status). Obviously my actual dataset is considerably bigger than what is shown here - I have many variables to assess making the longhand instruction to do every test pretty unsatisfactory. I think I can use ?for? or some other looping command for this purpose but I can?t work out how. I think I don?t properly understand how loops work yet as I'm still quite new to R. Please could someone help ? ideally with an explanation and some quick sample code? Derek -- View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498006.html Sent from the R help mailing list archive at Nabble.com. From A.Robinson at ms.unimelb.edu.au Thu May 5 12:37:36 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Thu, 5 May 2011 20:37:36 +1000 Subject: [R] nls problem with R In-Reply-To: <1304583633413-3497825.post@n4.nabble.com> References: <1304482083098-3494454.post@n4.nabble.com> <20110504071506.GU48756@ms.unimelb.edu.au> <1304518064519-3495672.post@n4.nabble.com> <1304583633413-3497825.post@n4.nabble.com> Message-ID: <20110505103736.GF11866@ms.unimelb.edu.au> Apologies, but I don't see a question here ... am I missing something obvious? Andrew On Thu, May 05, 2011 at 01:20:33AM -0700, sterlesser wrote: > ID1 ID2 t V(t) > 1 1 0 6.053078443 > 2 1 0.3403 5.56937391 > 3 1 0.4181 5.45484486 > 4 1 0.4986 5.193124598 > 5 1 0.7451 4.31386722 > 6 1 1.0069 3.645422269 > 7 1 1.5535 3.587710965 > 8 1 1.8049 3.740362689 > 9 1 2.4979 3.699837726 > 10 1 6.4903 2.908485019 > 11 1 13.5049 1.888179494 > 12 1 27.5049 1.176091259 > 13 1 41.5049 1.176091259 > > The model > (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))] > (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]} > > in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5} > > lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5} > A=(epi*c-lambda2)/(lambda1-lambda2) > > The regression rule : > for formula (1):(t<=2,that is) first 8 rows are used for non-linear > regression > epi,c,t0,V0 parameters are obtained > for formula (2):all 13 rows of results are used for non-linear regression > lambda1,lambda2,A (with these parameters, delta can be calculated from them) > > Thanks for help > Ster Lesser > > -- > View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From ehlers at ucalgary.ca Thu May 5 12:39:56 2011 From: ehlers at ucalgary.ca (P Ehlers) Date: Thu, 05 May 2011 04:39:56 -0600 Subject: [R] quantmod's addTA plotting functions In-Reply-To: References: Message-ID: <4DC27E7C.6050204@ucalgary.ca> On 2011-05-05 0:47, Russ Abbott wrote: > Hi, > > I'm having trouble with quantmod's addTA plotting functions. They seem to > work fine when run from the command line. But when run inside a function, > only the last one run is visible. Here's an example. > > > test.addTA<- function(from = "2010-06-01") { > getSymbols("^GSPC", from = from) > GSPC.close<- GSPC[,"GSPC.Close"] > GSPC.EMA.3<- EMA(GSPC.close, n=3, ratio=NULL) > GSPC.EMA.10<- EMA(GSPC.close, n=10, ratio=NULL) > chartSeries(GSPC.close, theme=chartTheme('white'), up.col="black", > dn.col="black") > addTA(GSPC.EMA.3, on = 1, col = "#0000ff") > addTA(GSPC.EMA.10, on = 1, col = "#ff0000") > # browser() > } > > > When I run this, GSPC.close always appears. But only GSPC.EMA10 appears on > the plot along with it. If I switch the order of the addTA calls, > only GSPC.EMA3 appears. If I uncomment the call to browser() neither appears > when the browser() interrupt occurs. I can then draw both GSPC.EMA.3 and > GSPC.EMA10 manually, and let the function terminate. All intended plots are > visible after the function terminates. So it isn't as if one wipes out the > other. This shows that it's possible to get all three lines on the plot, but > I can't figure out how to do it without manual intervention. Any suggestions > are appreciated. Perhaps you didn't see this NOTE on the ?TA help page: "Calling any of the above methods from within a function or script will generally require them to be wrapped in a plot call as they rely on the context of the call to initiate the actual charting addition." Peter Ehlers > > Thanks. > > *-- Russ * > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Gerrit.Eichner at math.uni-giessen.de Thu May 5 13:22:47 2011 From: Gerrit.Eichner at math.uni-giessen.de (Gerrit Eichner) Date: Thu, 5 May 2011 13:22:47 +0200 (MEST) Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304590606704-3498006.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> Message-ID: Hello, Derek, see below. On Thu, 5 May 2011, dereksloan wrote: > I still need to do some repetitive statistical analysis on some outcomes > from a dataset. > > Take the following as an example; > > id sex hiv age famsize bmi resprate > 1 M Pos 23 2 16 15 > 2 F Neg 24 5 18 14 > 3 F Pos 56 14 23 24 > 4 F Pos 67 3 33 31 > 5 M Neg 34 2 21 23 > > I want to know if there are statistically detectable differences in all > of the continuous variables in my data set when subdivided by sex or hiv > status (ie are age, family size, bmi and resprate different in my male > and female patients or in hiv pos/neg patients) Of course I can use > wilcoxon or t-tests e.g: > > wilcox.test( age~sex) > wilcox.test(famsize~sex) > wilcox.test(bmi~sex) > wilcox.test(resprate~sex) > wilcox.test( age~hiv) > wilcox.test(famsize~hiv) > wilcox.test(bmi~hiv) > wilcox.test(resprate~hiv) > .... [snip] Define, e. g., my.wilcox.tests <- function( var.names, groupvar.name, data) { lapply( var.names, function( v) { form <- as.formula( paste( v, "~", groupvar.name)) wilcox.test( form, data = data) } ) } and call something like my.wilcox.test( , , data = ) Caveat: untested! Hth -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner From csilva at ipimar.pt Thu May 5 13:25:44 2011 From: csilva at ipimar.pt (Cristina Silva) Date: Thu, 05 May 2011 12:25:44 +0100 Subject: [R] Panels order in lattice graphs In-Reply-To: References: <4DC175DD.2090806@ipimar.pt> Message-ID: <4DC28938.2010405@ipimar.pt> Thank you very much. Cristina On 05/05/2011 10:44, Deepayan Sarkar wrote: > On Wed, May 4, 2011 at 9:20 PM, Cristina Silva wrote: >> Hi all, >> >> In lattice graphs, panels are drawn from left to right and bottom to top. >> The flag "as.table=TRUE" changes to left to right and top to bottom. Is >> there any way to change to first top to bottom and then left to right? >> didn?t find anything neither in Help pages nor Lattice book. > See ?packet.panel.default. For example, > > > p<- xyplot(mpg ~ disp | factor(carb), mtcars, as.table = TRUE) > > print(p, packet.panel = packet.panel.default) > > my.packet.panel<- > function(layout, condlevels, page, row, column, ...) > { > tlayout<- layout[c(2, 1, 3)] # switch row and column > print(packet.panel.default(tlayout, condlevels, page = page, > row = column, column = row, ...)) > } > > print(p, packet.panel = my.packet.panel) > > > -Deepayan > -- ------------------------------------------ Cristina Silva INRB/L-IPIMAR Unidade de Recursos Marinhos e Sustentabilidade Av. de Bras?lia, 1449-006 Lisboa Portugal Tel.: 351 21 3027096 Fax: 351 21 3015948 csilva at ipimar.pt From joda2457 at student.uu.se Thu May 5 14:16:04 2011 From: joda2457 at student.uu.se (Joel) Date: Thu, 5 May 2011 05:16:04 -0700 (PDT) Subject: [R] Alter a line in a file. Message-ID: <1304597763938-3498187.post@n4.nabble.com> Hi all R users Ive got a file that contains diffrent settings in the manor of: setting1="value1" setting2="value2" setting3="value3" setting4="value4" . . . What I want to do is open the file and change the value of a specific setting like wanna change setting4="value4" -> setting4="value5" and then save the file again. setting1="value1" setting2="value2" setting3="value3" setting4="value5" . . . -- View this message in context: http://r.789695.n4.nabble.com/Alter-a-line-in-a-file-tp3498187p3498187.html Sent from the R help mailing list archive at Nabble.com. From f.harrell at vanderbilt.edu Thu May 5 14:55:57 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Thu, 5 May 2011 05:55:57 -0700 (PDT) Subject: [R] Draw a nomogram after glm In-Reply-To: <1304596297220-3498144.post@n4.nabble.com> References: <1304596297220-3498144.post@n4.nabble.com> Message-ID: <1304600157752-3498279.post@n4.nabble.com> Please read the documentation for the rms package, particularly the datadist function. Note that in your subject line glm should be lrm. Frank Komine wrote: > > Hi all R users > I did a logistic regression with my binary variable Y (0/1) and 2 > explanatory variables. > Now I try to draw my nomogram with predictive value. I visited the help of > R but I have problem to understand well the example. When I use glm > fonction, I have a problem, thus I use lrm. My code is: > modele<-lrm(Y~L+P,data=donnee) > fun<- function(x) plogis(x-modele$coef[1]+modele$coef[2]) > f <- Newlabels(modele,c(L="poids",P="taille")) > nomogram(f, fun=list('Prob Y<=1'=plogis), > fun.at=c(seq(0,1,by=.1),.95,.99), > lmgp=.1, cex.axis=.6) > fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99), > lmgp=.2, cex.axis=.6) > options(Fire=NULL) > Result is bad and I have this following error message: > Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = "full") : > variable L does not have limits defined by datadist > > Could you help me on the code to draw nomogram. > Nb: my English is low, I apologize. > Thank for your help > Komine > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Draw-a-nomogram-after-glm-tp3498144p3498279.html Sent from the R help mailing list archive at Nabble.com. From jholtman at gmail.com Thu May 5 15:02:02 2011 From: jholtman at gmail.com (jim holtman) Date: Thu, 5 May 2011 09:02:02 -0400 Subject: [R] Alter a line in a file. In-Reply-To: <1304597763938-3498187.post@n4.nabble.com> References: <1304597763938-3498187.post@n4.nabble.com> Message-ID: try this: a <- readLines(textConnection('setting1="value1" setting2="value2" setting3="value3" setting4="value4"')) closeAllConnections() # change values ac <- sub('setting4="value4"', 'setting4="value5"', a) writeLines(ac, con='myFile.txt') On Thu, May 5, 2011 at 8:16 AM, Joel wrote: > Hi all R users > > Ive got a file that contains diffrent settings in the manor of: > > setting1="value1" > setting2="value2" > setting3="value3" > setting4="value4" > . > . > . > > What I want to do is open the file and change the value of a specific > setting > like wanna change setting4="value4" -> setting4="value5" and then save the > file again. > > setting1="value1" > setting2="value2" > setting3="value3" > setting4="value5" > . > . > . > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-a-line-in-a-file-tp3498187p3498187.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From claudia.beleites at ipht-jena.de Thu May 5 15:01:28 2011 From: claudia.beleites at ipht-jena.de (Claudia Beleites) Date: Thu, 5 May 2011 15:01:28 +0200 Subject: [R] Outlier removal by Principal Component Analysis : error message In-Reply-To: <1304524556976-3496023.post@n4.nabble.com> References: <1304524556976-3496023.post@n4.nabble.com> Message-ID: <4DC29FA8.6070902@ipht-jena.de> Dear Boule, thank you for your interest in hyperSpec. In order to look into your *problem* I need some more information. I suggest that we solve the error off-list. Please note also that hyperSpec has its own help mailing list: hyperspec-help at lists.r-forge.r-project.org (due to the amount of spam I got to moderate, you need to subscribe first here: https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/hyperspec-help) - Which version of hyperSpec do you use? If it is the version from CRAN, could you please update to the development version at r-forge with install.packages("hyperSpec",repos="http://R-Forge.R-project.org") ? - Next, if the problem persists with the latest build, could you send me the raw data file so that I can exactly reproduce your problem? - Also, for tracking down the exact source of the error, please execute traceback () after you got the error and email me its output. It is basically impossible to give general recommendations about *Outlier detection*: a few spectra that are very different from all other spectra may be outliers or they may be the target of a study... This is also why the example in the vignette uses a two step procedure: PCA only identifies suspects, i.e. spectra that have very different scores than all others for some principal components. The second step is a manually supervised decision whether the spectrum is really an outlier. The first step could be replaced by other measures that however depend on your data. E.g. if you expect/know your data to consist of different clusters, suspects could be spectra that are too far away from any cluster. If your data comes from a mixture of a few components, spectra that cannot be modeled decently by a few PLS components could be suspicious. Or spectra that require an own component, ... Some kinds of outliers are actually well-defined in a spectroscopic sense, e.g. contamination by fluorescent lamp light. The second step could be replaced by an automatic decision, e.g. with a distance threshold. Personally, I rather use the term filtering for such automatic rules. And there you can think about any number of rules your spectra must comply with in order to be acceptable: signal to noise ratio, minimal and maximal intensity, original offset (baseline) less than, ... Hope that helps, Claudia > I am currently analysis Raman spectroscopic data with the hyperSpec package. > I consulted the documentation on this package and I found an example > work-flow dedicated to Raman spectroscopy (see the address : > http://hyperspec.r-forge.r-project.org/chondro.pdf) > > I am currently trying to remove outliers thanks to PCA just as they did in > the documentation, but I get a message error I can't explain. Here is my > code : > > "#import the data : > T=read.table('bladder bis concatenation colonne.txt',header=TRUE) > spec=new("hyperSpec",wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength="Raman > shift (cm-1)",spc="Intensity (a.u.)")) > > #baseline correction of the spectra > spec=spec[,,500~1800] > bl=spc.fit.poly.below(spec) > spec=spec-bl > > #normalization of the spectra > spec=sweep(spec,1,apply(spec,1,mean),'/') > > #PCA > pca=prcomp(~ spc,data=spec$.,center=TRUE) > scores=decomposition(spec,pca$x,label.wavelength="PC",label.spc="score/a.u.") > loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc="laoding > I/a.u.") > > #plot the scores of the first 20 PC against all other to have an idea where > to find the outliers > pairs(scores[[,,1:20]],pch=19,cex=0.5) > > #identify the outliers thanks to "map.identify" > out=map.identify(scores[,,5]) > Erreur dans `[.data.frame`(x at data, , j, drop = FALSE) : > undefined columns selected > > Does anybody understand where the problem comes from ? > And does anybody know another mean to find spectra outliers ? > > Thank you in advance. > > Boule > > -- > View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.beleites at ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399 From petr.pikal at precheza.cz Thu May 5 15:03:37 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Thu, 5 May 2011 15:03:37 +0200 Subject: [R] Odp: Alter a line in a file. In-Reply-To: <1304597763938-3498187.post@n4.nabble.com> References: <1304597763938-3498187.post@n4.nabble.com> Message-ID: Hi r-help-bounces at r-project.org napsal dne 05.05.2011 14:16:04: > Joel > Odeslal: r-help-bounces at r-project.org > > 05.05.2011 14:16 > > > Hi all R users > > Ive got a file that contains diffrent settings in the manor of: What file, what is its structure, is it some R object or separate file? What did you try and what went wrong? Regards Petr > > setting1="value1" > setting2="value2" > setting3="value3" > setting4="value4" > . > . > . > > What I want to do is open the file and change the value of a specific > setting > like wanna change setting4="value4" -> setting4="value5" and then save the > file again. > > setting1="value1" > setting2="value2" > setting3="value3" > setting4="value5" > . > . > . > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-a-line- > in-a-file-tp3498187p3498187.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bt_jannis at yahoo.de Thu May 5 15:07:57 2011 From: bt_jannis at yahoo.de (Jannis) Date: Thu, 5 May 2011 14:07:57 +0100 (BST) Subject: [R] Alter a line in a file. In-Reply-To: <1304597763938-3498187.post@n4.nabble.com> Message-ID: <691713.605.qm@web28214.mail.ukl.yahoo.com> Well your question is quite general the solution would involve several steps. Probably the easiest solution would be to read the data in as a dataframe (using read.table()) and using the '=' as the separator of the columns. Then change the desired values in the dataframe and save it back as a *.csv file, again using sep='='. Another option would be to read the data as a text string and use regexpressions to replace certain strings. Hope that gets you started Jannis --- Joel schrieb am Do, 5.5.2011: > Von: Joel > Betreff: [R] Alter a line in a file. > An: r-help at r-project.org > Datum: Donnerstag, 5. Mai, 2011 12:16 Uhr > Hi all R users > > Ive got a file that contains diffrent settings in the manor > of: > > setting1="value1" > setting2="value2" > setting3="value3" > setting4="value4" > . > . > . > > What I want to do is open the file and change the value of > a specific > setting > like wanna change setting4="value4" -> setting4="value5" > and then save the > file again. > > setting1="value1" > setting2="value2" > setting3="value3" > setting4="value5" > . > . > . > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-a-line-in-a-file-tp3498187p3498187.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > From joda2457 at student.uu.se Thu May 5 15:13:34 2011 From: joda2457 at student.uu.se (Joel) Date: Thu, 5 May 2011 06:13:34 -0700 (PDT) Subject: [R] Alter a line in a file. In-Reply-To: References: <1304597763938-3498187.post@n4.nabble.com> Message-ID: <1304601214942-3498316.post@n4.nabble.com> jholtman wrote: > > a <- readLines(textConnection('setting1="value1" > setting2="value2" > setting3="value3" > setting4="value4"')) > closeAllConnections() > # change values > ac <- sub('setting4="value4"', 'setting4="value5"', a) > writeLines(ac, con='myFile.txt') > Problem is that I dont know the value on all the settings that I wanna change otherwise that looks like something to continue on. Petr Pikal wrote: > > What file, what is its structure, is it some R object or separate file? > What did you try and what went wrong? > > Regards > Petr > Just a normal textfile nothing fancy Ive tried diffrent kind of ways of useing my OS witch is linux by the system command to do it for me but Im not good enought on regexp to get it to work properly. -- View this message in context: http://r.789695.n4.nabble.com/Alter-a-line-in-a-file-tp3498187p3498316.html Sent from the R help mailing list archive at Nabble.com. From petr.pikal at precheza.cz Thu May 5 15:39:33 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Thu, 5 May 2011 15:39:33 +0200 Subject: [R] Alter a line in a file. In-Reply-To: <1304601214942-3498316.post@n4.nabble.com> References: <1304597763938-3498187.post@n4.nabble.com> <1304601214942-3498316.post@n4.nabble.com> Message-ID: Hi r-help-bounces at r-project.org napsal dne 05.05.2011 15:13:34: > Joel > Odeslal: r-help-bounces at r-project.org > > 05.05.2011 15:13 > > > jholtman wrote: > > > > a <- readLines(textConnection('setting1="value1" > > setting2="value2" > > setting3="value3" > > setting4="value4"')) > > closeAllConnections() > > # change values > > ac <- sub('setting4="value4"', 'setting4="value5"', a) > > writeLines(ac, con='myFile.txt') > > > > Problem is that I dont know the value on all the settings that I wanna > change otherwise that looks like something to continue on. > But in that case how would you like to select the setting? > > > Petr Pikal wrote: > > > > What file, what is its structure, is it some R object or separate file? > > What did you try and what went wrong? > > > > Regards > > Petr > > > > Just a normal textfile nothing fancy > Ive tried diffrent kind of ways of useing my OS witch is linux by the system > command to do it for me but Im not good enought on regexp to get it to work > properly. I read the simple text file by read.table > data V1 1 setting1="value1" 2 setting2="value2" 3 setting3="value3" 4 setting4="value4" > grep("4", data$V1) [1] 4 Regards Petr > > -- > View this message in context: http://r.789695.n4.nabble.com/Alter-a-line- > in-a-file-tp3498187p3498316.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From pdalgd at gmail.com Thu May 5 15:52:45 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Thu, 5 May 2011 15:52:45 +0200 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: References: <8620F7B8-6856-4E48-A8C8-5D95D210AFDF@gmail.com> Message-ID: <090384AF-1451-4719-8B24-8D3F4707AEB3@gmail.com> On May 5, 2011, at 10:58 , JP wrote: > On 4 May 2011 15:32, peter dalgaard wrote: > >> >> On May 4, 2011, at 15:11 , JP wrote: >> >>> Peter thanks for the fantastically simple and understandable explanation... >>> >>> To sum it up... to find the z values of a number of pairwise wilcox >>> tests do the following: >>> >>> # pairwise tests with bonferroni correction >>> x <- pairwise.wilcox.test(a, b, alternative="two.sided", >>> p.adj="bonferroni", exact=F, paired=T) >> >> >> You probably don't want the bonferroni correction there. Rather p.adj="none". You generally correct the p values for multiple testing, not the test statistics. >> > > Oh, I see thanks... of course since I have 5 groups (samples) and 10 > comparisons I still have to correct when quoting p values... > > >> (My sentiment would be to pick apart the stats:::wilcox.test.default function and clone the computation of Z from it, but presumably backtracking from the p value is a useful expedient.) >> > > Should this be so onerous for the user [read non-statistician] ? My main reservation is that if the p value gets very small or close to 1 (for 1-sided tests), then it might not be possible to reconstruct the underlying Z (qnorm(pnorm(Z)) breaks down around Z = -37 and Z=+8). If you get sensible values, there's probably not a problem. > >>> # what is the data structure we got back >>> is.matrix(x$p.value) >>> # p vals >>> x$p.value >>> # z.scores for each >>> z.score <- qnorm(x$p.value / 2) >>> >> >> Hmm, you're not actually getting a signed z out of this, you might want to try alternative="greater" and drop the division by 2 inside qnorm(). (If the signs come out inverted, I meant "less" not "greater"...) >> > > But I need a two sided test (changing the alternative would change the > hypothesis!)... do I still do this? For the z's, yes. Hmm, the asymmetry of qnorm(pnorm(...)) indicates that you might want to be a bit more careful. > All my z values are negative.... > > Is this correct? No, that's the problem. The sign of Z should depend on whether the signed ranks are predominantly positive or negative. If you do a two-sided p-value and divide by two, then by definition you get something less than .5 and qnorm of that will be negative. So if the 2-sided p is 0.04, the one-sided p will be either 0.02 or 0.98, depending on the alternative and on whether the V statistic is above or below its expectation. Notice that this corresponds to Z statistics of opposite sign: > qnorm(.02) [1] -2.053749 > qnorm(.98) [1] 2.053749 -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From djsloan at liv.ac.uk Thu May 5 16:01:24 2011 From: djsloan at liv.ac.uk (dereksloan) Date: Thu, 5 May 2011 07:01:24 -0700 (PDT) Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304590606704-3498006.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> Message-ID: <1304604084040-3498427.post@n4.nabble.com> Your code may be untested but it works - also helping me slowly to start understanding how to write functions. Thank you. However I still have difficulty. I also have some categorical variables to analyse by age & hiv status - i.e. my dataset expands to (for example); id sex hiv age famsize bmi resprate smoker alcohol 1 M Pos 23 2 16 15 Y Y 2 F Neg 24 5 18 14 Y Y 3 F Pos 56 14 23 24 Y N 4 F Pos 67 3 33 31 N N 5 M Neg 34 2 21 23 N N Using the template for the code you sent me I thought I could analyse the categorical variables by sex & hiv status using a chiq-squared test; Long-hand this would be; chisq.test(smoker,sex) chisq.test(alcohol,sex) chisq.test(smoker,hiv) chisq.test(alcohol,hiv) Again I wanted to use a function to loop automate it and thought I could write; categ<-c(smoker,alcohol) group.name<-c(sex,hiv) bl.chisq<-function(categ,group.name,){ lapply(categ, function(y){ form2<-as.formula(paste(y,group.name)) chisq.test(form2,) }) } bl.chisq(categ,group.name,) but I get an error message: Error in parse(text = x) : unexpected symbol in "smoker sex" What is wrong with the code? Is is because the wilcox.test is a formula (with a ~ symbol for modelling) whilst the chisq.test simply requires me to list raw data? If so how can I change my code to automate the chisq.test in the same way I did for the wilcox.test? Many thanks for any help! Derek -- View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498427.html Sent from the R help mailing list archive at Nabble.com. From jeanpaul.ebejer at inhibox.com Thu May 5 16:15:20 2011 From: jeanpaul.ebejer at inhibox.com (JP) Date: Thu, 5 May 2011 15:15:20 +0100 Subject: [R] Simple General Statistics and R question (with 3 line example) - get z value from pairwise.wilcox.test In-Reply-To: <090384AF-1451-4719-8B24-8D3F4707AEB3@gmail.com> References: <8620F7B8-6856-4E48-A8C8-5D95D210AFDF@gmail.com> <090384AF-1451-4719-8B24-8D3F4707AEB3@gmail.com> Message-ID: Thanks once again Peter, I understand your points -- I fiddled and googled and read some more and found an eas(ier) route: library(coin) t <- wilcoxsign_test(a ~ b, alternative = "two.sided", distribution = exact()) # This is equivalent to paired wilcox.test pval <- pvalue(t) sweet.zscore <- statistic(t) # signed and everything # correct for multiple testing if you are doing a number of the above... I would never have got here without your guidance - so kudos to you. JP On 5 May 2011 14:52, peter dalgaard wrote: > > On May 5, 2011, at 10:58 , JP wrote: > >> On 4 May 2011 15:32, peter dalgaard wrote: >> >>> >>> On May 4, 2011, at 15:11 , JP wrote: >>> >>>> Peter thanks for the fantastically simple and understandable explanation... >>>> >>>> To sum it up... to find the z values of a number of pairwise wilcox >>>> tests do the following: >>>> >>>> # pairwise tests with bonferroni correction >>>> x <- pairwise.wilcox.test(a, b, alternative="two.sided", >>>> p.adj="bonferroni", exact=F, paired=T) >>> >>> >>> You probably don't want the bonferroni correction there. Rather p.adj="none". You generally correct the p values for multiple testing, not the test statistics. >>> >> >> Oh, I see thanks... of course since I have 5 groups (samples) and 10 >> comparisons I still have to correct when quoting p values... >> >> >>> (My sentiment would be to pick apart the stats:::wilcox.test.default function and clone the computation of Z from it, but presumably backtracking from the p value is a useful expedient.) >>> >> >> Should this be so onerous for the user [read non-statistician] ? > > > My main reservation is that if the p value gets very small or close to 1 (for 1-sided tests), then it might not be possible to reconstruct the underlying Z (qnorm(pnorm(Z)) breaks down around Z = -37 and Z=+8). If you get sensible values, there's probably not a problem. > > >> >>>> # what is the data structure we got back >>>> is.matrix(x$p.value) >>>> # p vals >>>> x$p.value >>>> # z.scores for each >>>> z.score <- qnorm(x$p.value / 2) >>>> >>> >>> Hmm, you're not actually getting a signed z out of this, you might want to try alternative="greater" and drop the division by 2 inside qnorm(). (If the signs come out inverted, I meant "less" not "greater"...) >>> >> >> But I need a two sided test (changing the alternative would change the >> hypothesis!)... ?do I still do this? > > For the z's, yes. > > Hmm, the asymmetry of qnorm(pnorm(...)) indicates that you might want to be a bit more careful. > >> All my z values are negative.... >> >> Is this correct? > > No, that's the problem. The sign of Z should depend on whether the signed ranks are predominantly positive or negative. If you do a two-sided p-value and divide by two, then by definition you get something less than .5 and qnorm of that will be negative. So if the 2-sided p is 0.04, the one-sided p will be either 0.02 or 0.98, depending on the alternative and on whether the V statistic is above or below its expectation. Notice that this corresponds to Z statistics of opposite sign: > >> qnorm(.02) > [1] -2.053749 >> qnorm(.98) > [1] 2.053749 > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com > > > From dwinsemius at comcast.net Thu May 5 16:25:15 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 10:25:15 -0400 Subject: [R] Question about error of "non-numeric argument to binary operator" In-Reply-To: <000001cc0ae5$b4a82a10$1df87e30$@gmail.com> References: <000001cc0ae5$b4a82a10$1df87e30$@gmail.com> Message-ID: On May 5, 2011, at 1:31 AM, Maximo Polanco wrote: > I have been trying to do a nls model and gives me the error of a > nonnumeric > argument > > This is my data set > > TT > > FFT > > V > > C > > table(file="c:/tt2.txt",header=T) > >> fit.model <- nls(TT~60*(1+alpha*(v/c)^beta),data=tt2, >> start=list(alpha=1, > beta=3, v=1000)) > > Error in v/c : non-numeric argument to binary operator You have (perhaps) defined a column name of "C" and then attempted to reference it with "c". Since `c` is a (rather fundamental) function in R, the interpreter has no problem accessing `c` but trying to take the ratio of a scalar to a function fails. You can try changing "c" to "C" above and it may "work", but in the future, please heed the issues raised at the bottom of this trimmed message! (It would be much better to not use either "C" or "c" for object names since they are both function names.) ?"C" C {stats} R Documentation Sets Contrasts for a FactorDescription Sets the "contrasts" attribute for the factor. Usage C(object, contr, how.many, ...) > >> is.numeric(tt2) > > [1] FALSE > >> is.character(tt2) > > [1] FALSE > >> as.numeric(tt2) > > Error: (list) object cannot be coerced to type 'double' > >> > > This is my data set > > TT > > FFT > > V > > C > > > 1 > > 70.475 > > 60 snipped long ... incorrectly formatted data due to failing to post in plain text. > > > > > [[alternative HTML version deleted]] *************************** > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ########################### -- David Winsemius, MD West Hartford, CT From krostir at gmail.com Thu May 5 12:15:36 2011 From: krostir at gmail.com (Kristof Ostir) Date: Thu, 5 May 2011 12:15:36 +0200 Subject: [R] Using error in histograms Message-ID: Hello! I am trying to produce a histogram of measurement data (orientation of archaeological structures) that are a subject to measurement error. The normal histogram just computes frequencies, but does not take into account that a particular value is spread over a range of values (in my case the spread is different for reach measurement and is larger than the bin size). The closest approach is kernel density estimation (in the image is a comparison of histogram and KDE): http://en.wikipedia.org/wiki/Kernel_density_estimation http://en.wikipedia.org/wiki/File:Comparison_of_1D_histogram_and_KDE.png However in my case the kernel size is different for each value. I wrote a program in IDL that performs the plotting, but am just wondering if such a function is available in R. Basically it is a problem of summing (and later plotting) several data distributions. I would appreciate also any hint to a book that might be dealing with the problem. I am not an expert in statistics and I might not be using use the correct terminology in web searches. Regards, Kristof From crosspide at hotmail.com Thu May 5 12:25:14 2011 From: crosspide at hotmail.com (agent dunham) Date: Thu, 5 May 2011 03:25:14 -0700 (PDT) Subject: [R] lm weight adj r-sq Message-ID: <1304591114461-3498020.post@n4.nabble.com> Dear all, First of all apologies, I'm pretty newbie, and secondly I know this is not a question for the forum but I'd be very grateful if you'd help me. I had problems with normality assumptions, and I tried with weights in the linear regression this way: modelw <-lm(log(v1)~v2+log(v2)+log(v3)+v4, data=dat, weight=1/group.var^2) I'd like to know how interpret/compute adj R-sq/mse from modelw, in order to compare it to the adj R-sq/mse from model (model <- lm(log(v1)~v2+log(v2)+log(v3)+v4, data=dat) Thanks in advance, user at host.com pd I set the weights following this notes: (I was reading here: statistics.unl.edu/faculty/bilder/stat870/schedule/Chapter%2011.doc http://statistics.unl.edu/faculty/bilder/) -- View this message in context: http://r.789695.n4.nabble.com/lm-weight-adj-r-sq-tp3498020p3498020.html Sent from the R help mailing list archive at Nabble.com. From ananta.acharya at gmail.com Thu May 5 12:34:25 2011 From: ananta.acharya at gmail.com (antu) Date: Thu, 5 May 2011 03:34:25 -0700 (PDT) Subject: [R] distance matrix Message-ID: <1304591665763-3498033.post@n4.nabble.com> Hello all, I am wondering if there is anyway to create distance matrix for replicated data for example, I have a data like sample pop id var1 var2 var3 var4 1.1 1 a 1 1 0 1 1.2 1 a 0 0 1 0 1.3 1 a 1 1 0 1 2.1 2 b 0 0 1 0 2.2 2 b 1 1 1 1 2.3 2 b 0 1 0 0 2.4 2 b 1 0 1 1 3.1 3 a 0 1 0 1 3.2 3 a 1 1 1 0 3.3 3 a 0 0 0 0 dist(data[,c(4:7)] gives the distance of samples, but I also need the distances of pop ie (1,2,3) and also id (a,b) how can I achieve this?? Thanks -- View this message in context: http://r.789695.n4.nabble.com/distance-matrix-tp3498033p3498033.html Sent from the R help mailing list archive at Nabble.com. From shekhar2581 at gmail.com Thu May 5 12:54:39 2011 From: shekhar2581 at gmail.com (Shekhar) Date: Thu, 5 May 2011 03:54:39 -0700 (PDT) Subject: [R] How to fit a random data into Beta distribution? In-Reply-To: References: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com>, , , Message-ID: <4b85497d-d2fb-4ce2-b633-9ab4f1385675@34g2000pru.googlegroups.com> Hi, @Steven: Since Beta distribution is a generic distribution by which i mean that by varying the parameter of alpha and beta we can fit any distribution. So to check this i generated a random data from Normal distribution like x.norm<-rnorm(n=100,mean=10,sd=10); Now i want to estimate the paramters alpha and beta of the beta distribution which will fit the above generated random data. That's what i want to do. @Ali: When you said you drafted your own procedure, do you mean that you are calculate the parameters using MLE or bayesian..???Can you please give me some more ideas into this? Thanks and Regards, Som Shekhar From shekhar2581 at gmail.com Thu May 5 13:02:41 2011 From: shekhar2581 at gmail.com (Shekhar) Date: Thu, 5 May 2011 04:02:41 -0700 (PDT) Subject: [R] Remove all whitespaces In-Reply-To: <1304585445836-3497867.post@n4.nabble.com> References: <1304585445836-3497867.post@n4.nabble.com> Message-ID: <13eee0db-d983-4b9d-98f9-9879c6f6c8e8@d26g2000prn.googlegroups.com> A more elegant way would be: myString<-"1 2 3 4 5" myString<-paste(unlist(strsplit(myString," ")),collapse="") The output will be "12345" Regards, Som Shekhar From marchywka at hotmail.com Thu May 5 13:37:04 2011 From: marchywka at hotmail.com (Mike Marchywka) Date: Thu, 5 May 2011 07:37:04 -0400 Subject: [R] nls problem with R In-Reply-To: <1304583633413-3497825.post@n4.nabble.com> References: <1304482083098-3494454.post@n4.nabble.com>, <20110504071506.GU48756@ms.unimelb.edu.au>, <1304518064519-3495672.post@n4.nabble.com>, , <1304583633413-3497825.post@n4.nabble.com> Message-ID: ---------------------------------------- > Date: Thu, 5 May 2011 01:20:33 -0700 > From: sterlesser at hotmail.com > To: r-help at r-project.org > Subject: Re: [R] nls problem with R > > ID1 ID2 t V(t) > 1 1 0 6.053078443 > 2 1 0.3403 5.56937391 > 3 1 0.4181 5.45484486 > 4 1 0.4986 5.193124598 > 5 1 0.7451 4.31386722 > 6 1 1.0069 3.645422269 > 7 1 1.5535 3.587710965 > 8 1 1.8049 3.740362689 > 9 1 2.4979 3.699837726 > 10 1 6.4903 2.908485019 > 11 1 13.5049 1.888179494 > 12 1 27.5049 1.176091259 > 13 1 41.5049 1.176091259 > > The model > (1) V(t)=V0[1-epi+ epi*exp(-c(t-t0))] A=Vo, B-Vo*epi, C=exp(-c*t0) V(t)=A-B+B*C*exp(-ct) or further, D=A-B, F=B*C, V(t)=D+F*exp(-ct) this model only really has 3 attriubtes: initial value, final value, and decay constant yet you ask for 4 parameters. There is no way to get a unique answer. For some reason this same form comes up a lot here, I think this is about third time I've sene this in last few weeks. I guess when fishing or shopping for forms to fit, it is tempting to throw a bunch of parameteres into your model but this can create intractable ambiguities. Indeed, if I just remove t0 and use your first 8 points I get this ( random starting values, but convewrged easily you still need to plot etc) [1] "1?? v= 8.77181162126362? epi= 0.672516376478598? cl= 1.90973175223917 t0= 0 .643481321167201" > summary(nls2) Formula: V2 ~ v0 * (1 - epi + epi * exp(-cl * (T2))) Parameters: ??? Estimate Std. Error t value Pr(>|t|) v0??? 6.2901???? 0.3384? 18.585? 8.3e-06 *** epi?? 0.5430???? 0.1373?? 3.955?? 0.0108 * cl??? 0.9684???? 0.5491?? 1.763?? 0.1381 --- Signif. codes:? 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3579 on 5 degrees of freedom Number of iterations to convergence: 11 Achieved convergence tolerance: 4.057e-06 > (2) V(t)=V0{A*exp[-lambda1(t-t0)]+(1-A)*exp[-lambda2(t-t0)]} > > in formula (2) lambda1=0.5*{(c+delta)+[(c-delta)^2+4*(1-epi)*c*delta]^0.5} > > lambda2=0.5*{(c+delta)-[(c-delta)^2+4*(1-epi)*c*delta]^0.5} > A=(epi*c-lambda2)/(lambda1-lambda2) > > The regression rule : > for formula (1):(t<=2,that is) first 8 rows are used for non-linear > regression > epi,c,t0,V0 parameters are obtained > for formula (2):all 13 rows of results are used for non-linear regression > lambda1,lambda2,A (with these parameters, delta can be calculated from them) > > Thanks for help > Ster Lesser > > -- > View this message in context: http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3497825.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ananta.acharya at gmail.com Thu May 5 13:37:38 2011 From: ananta.acharya at gmail.com (antu) Date: Thu, 5 May 2011 04:37:38 -0700 (PDT) Subject: [R] Convert a presence/ absence matrix to a list of presence only In-Reply-To: References: <1303495813826-3468479.post@n4.nabble.com> Message-ID: <1304595458340-3498116.post@n4.nabble.com> Thank you, it perfectly worked, except I had to modify some codes on the rep("", maxLen - length(sp)) because it gave me some error, thanks -- View this message in context: http://r.789695.n4.nabble.com/Convert-a-presence-absence-matrix-to-a-list-of-presence-only-tp3468479p3498116.html Sent from the R help mailing list archive at Nabble.com. From matibie at gmail.com Thu May 5 13:50:16 2011 From: matibie at gmail.com (matibie) Date: Thu, 5 May 2011 04:50:16 -0700 (PDT) Subject: [R] Insert values to histogram Message-ID: <1304596216055-3498140.post@n4.nabble.com> I'm trying to add the exact value on top of each column of an Histogram, i have been trying with the text function but it doesn't work. The problem is that the program it self decides the exact value to give to each column, and ther is not like in a bar-plot that I know exactly which values are been plotting. If anyone have any new idea on how to do this Thanks Matias -- View this message in context: http://r.789695.n4.nabble.com/Insert-values-to-histogram-tp3498140p3498140.html Sent from the R help mailing list archive at Nabble.com. From momadou at yahoo.fr Thu May 5 13:51:37 2011 From: momadou at yahoo.fr (Komine) Date: Thu, 5 May 2011 04:51:37 -0700 (PDT) Subject: [R] Draw a nomogram after glm Message-ID: <1304596297220-3498144.post@n4.nabble.com> Hi all R users I did a logistic regression with my binary variable Y (0/1) and 2 explanatory variables. Now I try to draw my nomogram with predictive value. I visited the help of R but I have problem to understand well the example. When I use glm fonction, I have a problem, thus I use lrm. My code is: modele<-lrm(Y~L+P,data=donnee) fun<- function(x) plogis(x-modele$coef[1]+modele$coef[2]) f <- Newlabels(modele,c(L="poids",P="taille")) nomogram(f, fun=list('Prob Y<=1'=plogis), fun.at=c(seq(0,1,by=.1),.95,.99), lmgp=.1, cex.axis=.6) fun.at=c(.01,.05,seq(.1,.9,by=.1),.95,.99), lmgp=.2, cex.axis=.6) options(Fire=NULL) Result is bad and I have this following error message: Erreur dans value.chk(at, i, NA, -nint, Limval, type.range = "full") : variable L does not have limits defined by datadist Could you help me on the code to draw nomogram. Nb: my English is low, I apologize. Thank for your help Komine -- View this message in context: http://r.789695.n4.nabble.com/Draw-a-nomogram-after-glm-tp3498144p3498144.html Sent from the R help mailing list archive at Nabble.com. From elodie.chapeaublanc at curie.fr Thu May 5 13:56:28 2011 From: elodie.chapeaublanc at curie.fr (elodie) Date: Thu, 5 May 2011 04:56:28 -0700 (PDT) Subject: [R] heatmap.3 In-Reply-To: <1267026781766-1567638.post@n4.nabble.com> References: <1266960784732-1566584.post@n4.nabble.com> <1267026781766-1567638.post@n4.nabble.com> Message-ID: <1304596588645-3498156.post@n4.nabble.com> I am intesrested by your heatmap function (allowing matrix in ColSideColors option). Can you give your complete code of your function? thanks -- View this message in context: http://r.789695.n4.nabble.com/heatmap-3-tp1566584p3498156.html Sent from the R help mailing list archive at Nabble.com. From shekhar2581 at gmail.com Thu May 5 13:41:45 2011 From: shekhar2581 at gmail.com (Shekhar) Date: Thu, 5 May 2011 04:41:45 -0700 (PDT) Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304590606704-3498006.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> Message-ID: <682a4065-bf4b-4f7a-ae27-1cc372a92f8d@35g2000prp.googlegroups.com> Hi Derek, You can accomplish your loop jobs by following means: (a) use for loop (b) use while loop (c) use lapply, tapply, or sapply. (i feel "lapply is the elegant way ) ---------------For Loop----------------------------- "for" loops are pretty simple to use and is almost similar to any other scripting languages you know.( I am referring to Matlab) (Example 1) lets say you know that you have to run 10 iterations then you can run it as for(i in 1:10) print(i) //it will print the number from 1 to 10 (Example 2) You don't know how many iterations you need to run. Only thing you have is some vector and you want to do some operation on that vector. You can do something like this: myVector<-c(20,45,23,45,89) for(i in seq_along(myVector)) print(myVector[i] -------------Using lapply------------------------- In "lapply" you need to provide mainly two things: (1)First parameter: vectors or some sequence of numbers (2)Second parameter: A function which could be user defined function or some other inbuilt function. lapply will call the function for every number given in the "First parameter of the function) For example: x<-c(10,20,20) lapply(seq_along(x),function(i) {//your logic}) if you see the first parameter i have sent seq_along(x). The outcome of seq_along(x) will be 1, 2,3. Now lapply will take each of these numbers and call the function. That means lapply is calling the function thrice for the current data set something like this function(1) { //your logic} function(2) { } function(3) { //) That means your logic inside the function will be executed for each and every value specified in the first parameter of the lapply function. I hope it helps you in some way. For your problem, i am making a guess that you are using data frame or matrix to store the data and then you want to automate the data right? You can try using "lapply", i think that would be efficient..Let me also try .. Regards, Som Shekhar From l.j.bonnett at gmail.com Thu May 5 14:20:38 2011 From: l.j.bonnett at gmail.com (Laura Bonnett) Date: Thu, 5 May 2011 13:20:38 +0100 Subject: [R] Confidence interval for difference in Harrell's c statistics (or equivalently Somers' D statistics) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From polly__c at hotmail.com Thu May 5 14:48:45 2011 From: polly__c at hotmail.com (pcc) Date: Thu, 5 May 2011 05:48:45 -0700 (PDT) Subject: [R] Null Message-ID: <1304599725057-3498261.post@n4.nabble.com> This is probably a very simple question but I am completely stumped!I am trying to do shapiro.wilk(x) test on a relatively small dataset(75) and each time my variable and keeps coming out as 'NULL', and > shapiro.test(fcv) Error in complete.cases(x) : no input has determined the number of cases my text file looks like this: case 1.600972896 1.534026106 1.633468456 1.69019608 1.686636269 1.713490543 1.460897843 1.604226053 1.547774705 1.575187845 1.50242712 1.489958479 1.555094449 1.56937391 1.46686762 1.583198774 1.59439255 1.627365857 1.596597096 1.598790507 1.596597096 1.613841822 1.607455023 1.586587305 1.72427587 1.668385917 1.743509765 1.5774918 1.709269961 1.507855872 1.650307523 1.670245853 1.721810615 1.613841822 1.586587305 1.658011397 1.595496222 1.662757832 1.521138084 1.564666064 1.515873844 1.596597096 1.617000341 1.621176282 1.598790507 1.73479983 1.498310554 1.571708832 1.426511261 1.698970004 1.534026106 1.5774918 1.682145076 1.689308859 1.654176542 1.526339277 1.545307116 1.658964843 1.638489257 1.557507202 1.604226053 1.627365857 1.651278014 1.627365857 1.559906625 1.720159303 1.64738297 1.62324929 1.698970004 1.704150517 1.57863921 1.558708571 1.681241237 1.539076099 1.5132176 Any ideas? -- View this message in context: http://r.789695.n4.nabble.com/Null-tp3498261p3498261.html Sent from the R help mailing list archive at Nabble.com. From pamela.santelices at ine.cl Thu May 5 15:25:11 2011 From: pamela.santelices at ine.cl (Pamela Santelices Elgueta) Date: Thu, 5 May 2011 09:25:11 -0400 Subject: [R] RV: R question Message-ID: <8E142E54149A314AAA92382A8D3447B129A5F82F58@EXCLUS2007.ine.cl> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pangdu at vt.edu Thu May 5 15:37:15 2011 From: pangdu at vt.edu (Pang Du) Date: Thu, 5 May 2011 09:37:15 -0400 Subject: [R] two-way group mean prediction in survreg with three factors In-Reply-To: <20110505021249.GB11866@ms.unimelb.edu.au> References: <20110505021249.GB11866@ms.unimelb.edu.au> Message-ID: <226A90808C6948B9AD39F895F91941B9@statistics.vt.edu> Oops, I hope not too. Don't know why I had the brackets around B+C. My model is actually A*B+C. And I'm not sure how to obtain the two-way prediction of AB with C marginalized. Thanks. Pang -----Original Message----- From: Andrew Robinson [mailto:A.Robinson at ms.unimelb.edu.au] Sent: Wednesday, May 04, 2011 10:13 PM To: Pang Du Cc: r-help at r-project.org Subject: Re: [R] two-way group mean prediction in survreg with three factors I hope not! Facetiousness aside, the model that you have fit contains C, and, indeed, an interaction between A and C. So, the effect of A upon the response variable depends on the level of C. The summary you want must marginalize C somehow, probably by a weighted or unweighted average across its levels. What does that summary really mean? Can you meaningfully average across the levels of a predictor that is included in the model as a main and an interaction term? Best wishes Andrew On Wed, May 04, 2011 at 12:24:50PM -0400, Pang Du wrote: > I'm fitting a regression model for censored data with three categorical > predictors, say A, B, C. My final model based on the survreg function is > > Surv(..) ~ A*(B+C). > > I know the three-way group mean estimates can be computed using the predict > function. But is there any way to obtain two-way group mean estimates, say > estimated group mean for (A1, B1)-group? The sample group means don't > incorporate censoring and thus may not be appropriate here. > > > > Pang Du > > Virginia Tech > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From arnau.mir at uib.es Thu May 5 16:42:58 2011 From: arnau.mir at uib.es (Arnau Mir) Date: Thu, 5 May 2011 16:42:58 +0200 Subject: [R] functions pandit and treebase in the package apTreeshape Message-ID: <706AD256-C731-48AB-BFCE-4715BA28ACAF@uib.es> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rvaradhan at jhmi.edu Thu May 5 16:50:39 2011 From: rvaradhan at jhmi.edu (Ravi Varadhan) Date: Thu, 5 May 2011 10:50:39 -0400 Subject: [R] How to fit a random data into Beta distribution? In-Reply-To: <4b85497d-d2fb-4ce2-b633-9ab4f1385675@34g2000pru.googlegroups.com> References: <5be1cce0-d00b-45f2-adb1-e3682f6bd637@l14g2000pro.googlegroups.com> , <4b85497d-d2fb-4ce2-b633-9ab4f1385675@34g2000pru.googlegroups.com> Message-ID: <79F23BA7BB084E4FA01A8B93904CD02CF669E9F33E@WIGGUMVS.win.ad.jhu.edu> Beta is not as general as you think. Its support is limited to [0,1], but you are trying to fit data that lies outside of its support. Please read about the beta distribution from a basic stats/prob book. Ravi. ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Shekhar [shekhar2581 at gmail.com] Sent: Thursday, May 05, 2011 6:54 AM To: r-help at r-project.org Subject: Re: [R] How to fit a random data into Beta distribution? Hi, @Steven: Since Beta distribution is a generic distribution by which i mean that by varying the parameter of alpha and beta we can fit any distribution. So to check this i generated a random data from Normal distribution like x.norm<-rnorm(n=100,mean=10,sd=10); Now i want to estimate the paramters alpha and beta of the beta distribution which will fit the above generated random data. That's what i want to do. @Ali: When you said you drafted your own procedure, do you mean that you are calculate the parameters using MLE or bayesian..???Can you please give me some more ideas into this? Thanks and Regards, Som Shekhar ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From alan.fernihough at gmail.com Thu May 5 16:59:37 2011 From: alan.fernihough at gmail.com (Alan Fernihough) Date: Thu, 5 May 2011 15:59:37 +0100 Subject: [R] Conditional distribution plot using Model-based Recursive Partitioning Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Thu May 5 17:04:21 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 11:04:21 -0400 Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304604084040-3498427.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> <1304604084040-3498427.post@n4.nabble.com> Message-ID: <9436384F-B91B-49AF-AFBF-395690C537DE@comcast.net> On May 5, 2011, at 10:01 AM, dereksloan wrote: > Your code may be untested but it works - also helping me slowly to > start > understanding how to write functions. Thank you. > > However I still have difficulty. I also have some categorical > variables to > analyse by age & hiv status - i.e. my dataset expands to (for > example); > > id sex hiv age famsize bmi resprate smoker alcohol > 1 M Pos 23 2 16 15 Y Y > 2 F Neg 24 5 18 14 Y Y > 3 F Pos 56 14 23 24 Y N > 4 F Pos 67 3 33 31 N N > 5 M Neg 34 2 21 23 N N > > > Using the template for the code you sent me I thought I could > analyse the > categorical variables by sex & hiv status using a chiq-squared test; > > Long-hand this would be; > > chisq.test(smoker,sex) > chisq.test(alcohol,sex) > chisq.test(smoker,hiv) > chisq.test(alcohol,hiv) > > Again I wanted to use a function to loop automate it and thought I > could > write; > > categ<-c(smoker,alcohol) > group.name<-c(sex,hiv) > bl.chisq<-function(categ,group.name,){ > lapply(categ, > function(y){ > form2<-as.formula(paste(y,group.name)) I haven't tested it but I suspect you failed to note that Eichner used sep="~" in his paste argument to as.formula(). > chisq.test(form2,) > }) > } > > bl.chisq(categ,group.name,) > > but I get an error message: > > Error in parse(text = x) : unexpected symbol in "smoker sex" > > What is wrong with the code? Is is because the wilcox.test is a > formula > (with a ~ symbol for modelling) whilst the chisq.test simply > requires me to > list raw data? If so how can I change my code to automate the > chisq.test in > the same way I did for the wilcox.test? > > Many thanks for any help! > > Derek > > David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Thu May 5 17:09:54 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 11:09:54 -0400 Subject: [R] Confidence interval for difference in Harrell's c statistics (or equivalently Somers' D statistics) In-Reply-To: References: Message-ID: On May 5, 2011, at 8:20 AM, Laura Bonnett wrote: > Dear All, > > I am trying to calculate a 95% confidence interval for the > difference in two > c statistics (or equivalently D statistics). In Stata I gather that > this > can be done using the lincom command. Is there anything similar in R? Have you looked at rcorrp.cens {Hmisc}? > > [[alternative HTML version deleted]] Please post in plain text. -- David Winsemius, MD West Hartford, CT From crabak at acm.org Thu May 5 17:09:26 2011 From: crabak at acm.org (csrabak) Date: Thu, 5 May 2011 12:09:26 -0300 Subject: [R] distance matrix In-Reply-To: <1304591665763-3498033.post@n4.nabble.com> References: <1304591665763-3498033.post@n4.nabble.com> Message-ID: Em 5/5/2011 07:34, antu escreveu: > Hello all, > > I am wondering if there is anyway to create distance matrix for replicated > data > > for example, > > I have a data like > > sample pop id var1 var2 var3 var4 > 1.1 1 a 1 1 0 1 > 1.2 1 a 0 0 1 0 > 1.3 1 a 1 1 0 1 > 2.1 2 b 0 0 1 0 > 2.2 2 b 1 1 1 1 > 2.3 2 b 0 1 0 0 > 2.4 2 b 1 0 1 1 > 3.1 3 a 0 1 0 1 > 3.2 3 a 1 1 1 0 > 3.3 3 a 0 0 0 0 > > > dist(data[,c(4:7)] gives the distance of samples, but I also need the > distances of pop ie (1,2,3) and also id (a,b) how can I achieve this?? > Just a doubt: does the idea of comparing (ordering) the variable pop and id make sense? Or expressed in more direct way: what would a distance between b and a mean (same for the pop labels)? From janko.thyson.rstuff at googlemail.com Thu May 5 16:45:13 2011 From: janko.thyson.rstuff at googlemail.com (Janko Thyson) Date: Thu, 05 May 2011 16:45:13 +0200 Subject: [R] Tone in mailing lists (was " issue with "strange" characters (readHTMLTable)") In-Reply-To: <7600dff660597.4dc28b19@rug.nl> References: <7600dff660597.4dc28b19@rug.nl> Message-ID: <4DC2B7F9.9040603@googlemail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From silvano at uel.br Thu May 5 16:41:09 2011 From: silvano at uel.br (Silvano) Date: Thu, 5 May 2011 11:41:09 -0300 Subject: [R] Boxplot in order Message-ID: <4238836B2F8D4598BC9BFD477324F83D@ccePC> Hi, I need construct box plot graph, but I want keep Groups order karla = data.frame( Groups = factor(rep(c('CPre','SPre','C7','S7','C14','S14','C21','S21'), 11)), Time = rep(c(0,7,14,21), 11), Resp = valor ) boxplot(Resp~Groups, order=T) doesn't work. How do this? -------------------------------------- Silvano Cesar da Costa Departamento de Estat?stica Universidade Estadual de Londrina Fone: 3371-4346 From ybaranan at hotmail.com Thu May 5 16:48:03 2011 From: ybaranan at hotmail.com (yoav baranan) Date: Thu, 5 May 2011 17:48:03 +0300 Subject: [R] cross-correlation table with subscript or superscript to indicate significant differences Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From yutamberg at gmail.com Thu May 5 17:15:38 2011 From: yutamberg at gmail.com (Yuta Tamberg) Date: Thu, 5 May 2011 08:15:38 -0700 (PDT) Subject: [R] Null In-Reply-To: <1304599725057-3498261.post@n4.nabble.com> References: <1304599725057-3498261.post@n4.nabble.com> Message-ID: <1304608538881-3498607.post@n4.nabble.com> There were no problems when I repeated the test with data provided. I simply created a vector xxxx<-c(1.600972896,1.534026106,1.633468456,1.69019608,1.686636269,1.713490543,1.460897843,1.604226053,1.547774705,1.575187845,1.50242712,1.489958479,1.555094449,1.56937391,1.46686762,1.583198774,1.59439255,1.627365857,1.596597096,1.598790507,1.596597096,1.613841822,1.607455023,1.586587305,1.72427587,1.668385917,1.743509765,1.5774918,1.709269961,1.507855872,1.650307523,1.670245853,1.721810615,1.613841822,1.586587305,1.658011397,1.595496222,1.662757832,1.521138084,1.564666064,1.515873844,1.596597096,1.617000341,1.621176282,1.598790507,1.73479983,1.498310554,1.571708832,1.426511261,1.698970004,1.534026106,1.5774918,1.682145076,1.689308859,1.654176542,1.526339277,1.545307116,1.658964843,1.638489257,1.557507202,1.604226053,1.627365857,1.651278014,1.627365857,1.559906625,1.720159303,1.64738297,1.62324929,1.698970004,1.704150517,1.57863921,1.558708571,1.681241237,1.539076099,1.5132176) and performed shapiro.test(xxxx) with following results: W = 0.9876, p-value = 0.677 As far as I understand R, complete.cases has nothing to do with normality check of single vector. pcc wrote: > > This is probably a very simple question but I am completely stumped!I am > trying to do shapiro.wilk(x) test on a relatively small dataset(75) and > each time my variable and keeps coming out as 'NULL' -- View this message in context: http://r.789695.n4.nabble.com/Null-tp3498261p3498607.html Sent from the R help mailing list archive at Nabble.com. From mikael.anderson at gmail.com Thu May 5 17:33:51 2011 From: mikael.anderson at gmail.com (Mikael Anderson) Date: Thu, 5 May 2011 17:33:51 +0200 Subject: [R] Compiling a FORTRAN program under Windows 7 Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From joubd at hotmail.com Thu May 5 17:34:02 2011 From: joubd at hotmail.com (David Joubert) Date: Thu, 5 May 2011 15:34:02 +0000 Subject: [R] Vermunt's LEM in R In-Reply-To: <81D92650D162664791CB7C482A878412055A3D32@kmbx2.utk.tennessee.edu> References: <81D92650D162664791CB7C482A878412055A3D32@kmbx2.utk.tennessee.edu> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From l.j.bonnett at gmail.com Thu May 5 17:28:34 2011 From: l.j.bonnett at gmail.com (Laura Bonnett) Date: Thu, 5 May 2011 16:28:34 +0100 Subject: [R] Confidence interval for difference in Harrell's c statistics (or equivalently Somers' D statistics) In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mmauro at students.unibe.ch Thu May 5 18:09:24 2011 From: mmauro at students.unibe.ch (Mauro) Date: Thu, 5 May 2011 09:09:24 -0700 (PDT) Subject: [R] perspective plot Message-ID: <1304611764230-3498754.post@n4.nabble.com> Hello, I`m tryint to plot some variable over x and y coordinate, but can`t figure out hot to do it ( I could do a simple scatterplot3d, but there I can`t change the viewing angle). My dataset looks something like this (dataframe): X Y Q95 21 2628711 1104437 0.7723994 22 2628721 1104437 0.5961789 23 2628731 1104437 1.2013182 24 2628741 1104437 1.3468632 25 2628751 1104437 1.1035517 26 2628761 1104437 1.0528809 Is there a way to plot Q95 over x and y? Something like persp(x,y,z) would be nice, but can`t figure out the right input format for this function. Thanks a lot, Mauro -- View this message in context: http://r.789695.n4.nabble.com/perspective-plot-tp3498754p3498754.html Sent from the R help mailing list archive at Nabble.com. From dwinsemius at comcast.net Thu May 5 18:17:25 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 12:17:25 -0400 Subject: [R] cross-correlation table with subscript or superscript to indicate significant differences In-Reply-To: References: Message-ID: <3495B5AA-A9DB-4342-96B4-BEE608F2B3AE@comcast.net> On May 5, 2011, at 10:48 AM, yoav baranan wrote: > > Hi, I wonder whether the following is possible with R, and whether > anyone has done that and can share his/her code with me. I have a > correlation matrix, and I want to create a correlation table that I > can copy to Microsoft Word with a superscript above each > correlation, indicating significant differences in the same row. > That is, when correlations in the same row do not share superscript, > it means that they are significantly different from each other. > thanks,yoav > [[alternative HTML version deleted]] An example with data and the desired result might help focus the discussion. This shows how to set up an example showing how extract the row numbers from a correlation matrix with absolute values above 0.5 but less than 1 (to exclude the trivial cases). > set.seed(123) > X <- matrix(rnorm(100), 10) > apply(cor(X), 2, function(x) which(abs(x) > 0.5 & x < 1) ) [[1]] [1] 2 4 8 [[2]] [1] 1 3 [[3]] [1] 2 6 9 [[4]] [1] 1 7 [[5]] integer(0) [[6]] [1] 3 10 [[7]] [1] 4 [[8]] [1] 1 [[9]] [1] 3 [[10]] [1] 6 This would extract the rownames if they are letters[1:10] > lapply( apply(cor(X), 2, function(x) which(abs(x) > 0.5 & x < 1) ), function(x) rownames(X)[x]) [[1]] [1] "b" "d" "h" [[2]] [1] "a" "c" [[3]] [1] "b" "f" "i" [[4]] [1] "a" "g" [[5]] character(0) [[6]] [1] "c" "j" [[7]] [1] "d" [[8]] [1] "a" [[9]] [1] "c" [[10]] [1] "f" Exactly how we are supposed to pass this to MS Word does not seem to be a proper question for this mailing list. -- David Winsemius, MD West Hartford, CT From abatealem at gmail.com Thu May 5 18:03:22 2011 From: abatealem at gmail.com (Alemtsehai Abate) Date: Thu, 5 May 2011 17:03:22 +0100 Subject: [R] simulate AR(1) process Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From spector at stat.berkeley.edu Thu May 5 18:27:08 2011 From: spector at stat.berkeley.edu (Phil Spector) Date: Thu, 5 May 2011 09:27:08 -0700 (PDT) Subject: [R] simulate AR(1) process In-Reply-To: References: Message-ID: ?arima.sim - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Thu, 5 May 2011, Alemtsehai Abate wrote: > Dear R users, > May any of you tell me how to simulate data on: > y_t = a+b*y_{t-1} + u_t > where u_t~N(0,sigma^2), b<1, and for some constant a. > > Many thanks > > Tsegaye > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From clint at ecy.wa.gov Thu May 5 18:28:27 2011 From: clint at ecy.wa.gov (Clint Bowman) Date: Thu, 5 May 2011 09:28:27 -0700 (PDT) Subject: [R] Compiling a FORTRAN program under Windows 7 In-Reply-To: References: Message-ID: You are compiling a subroutine not a program and you compile line should read: gfortran testit.f -c testit.o You then reference that object code testit.o in your final loading stage after compiling other routiens and the main program. -- Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Thu, 5 May 2011, Mikael Anderson wrote: > Hi, > > I am trying to compile a FORTRAN program to call from R under Windows 7 but > I am having problem in the compiling step. To demonstrate this is the > program testit.f: > > ------------------------------------------ > subroutine TESTIT(x,n,m) > dimension x(n) > do 10 i=1,n > 10 x(i)=x(i)**m > end > -------------------------------------------- > > When I compile it with gfortran I get the following error: > > -------------------------------------------------- > c:\MinGW\programs>gfortran testit.f -o testit.o > c:/mingw/bin/../lib/gcc/mingw32/4.5.2/../../../libmingw32.a(main.o):main.c:(.tex > t+0xd2): undefined reference to `WinMain at 16' > collect2: ld returned 1 exit status. > ---------------------------------------------------- > > I should add that a program like the following hello.f compiles with no > problem. > > ------------------------------------------ > READ (*, *) YOURNAME > WRITE (*, 200) YOURNAME > 200 FORMAT(//,' Hello ',A/) > STOP > END > ------------------------------------------ > > I realize that this is not directly a question about R but I guess there are > some people here who have compiled FORTRAN programs under Windows 7 to call > from R. I appreciate any help to fix the problem. > > /Mikael > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From thomas.levine at gmail.com Thu May 5 18:32:42 2011 From: thomas.levine at gmail.com (Thomas Levine) Date: Thu, 5 May 2011 12:32:42 -0400 Subject: [R] Null In-Reply-To: <1304599725057-3498261.post@n4.nabble.com> References: <1304599725057-3498261.post@n4.nabble.com> Message-ID: Maybe you were doing something like fcv <- read.csv('fcv.csv') instead of fcv <- read.csv('fcv.csv')[1] (I haven't tested this.) Tom On Thu, May 5, 2011 at 8:48 AM, pcc wrote: > > This is probably a very simple question but I am completely stumped!I am > trying to do shapiro.wilk(x) test on a relatively small dataset(75) and each > time my variable and keeps coming out as 'NULL', and > > > shapiro.test(fcv) > Error in complete.cases(x) : no input has determined the number of cases > > my text file looks like this: > > case > 1.600972896 > 1.534026106 > 1.633468456 > 1.69019608 > 1.686636269 > 1.713490543 > 1.460897843 > 1.604226053 > 1.547774705 > 1.575187845 > 1.50242712 > 1.489958479 > 1.555094449 > 1.56937391 > 1.46686762 > 1.583198774 > 1.59439255 > 1.627365857 > 1.596597096 > 1.598790507 > 1.596597096 > 1.613841822 > 1.607455023 > 1.586587305 > 1.72427587 > 1.668385917 > 1.743509765 > 1.5774918 > 1.709269961 > 1.507855872 > 1.650307523 > 1.670245853 > 1.721810615 > 1.613841822 > 1.586587305 > 1.658011397 > 1.595496222 > 1.662757832 > 1.521138084 > 1.564666064 > 1.515873844 > 1.596597096 > 1.617000341 > 1.621176282 > 1.598790507 > 1.73479983 > 1.498310554 > 1.571708832 > 1.426511261 > 1.698970004 > 1.534026106 > 1.5774918 > 1.682145076 > 1.689308859 > 1.654176542 > 1.526339277 > 1.545307116 > 1.658964843 > 1.638489257 > 1.557507202 > 1.604226053 > 1.627365857 > 1.651278014 > 1.627365857 > 1.559906625 > 1.720159303 > 1.64738297 > 1.62324929 > 1.698970004 > 1.704150517 > 1.57863921 > 1.558708571 > 1.681241237 > 1.539076099 > 1.5132176 > > Any ideas? > > -- > View this message in context: http://r.789695.n4.nabble.com/Null-tp3498261p3498261.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Greg.Snow at imail.org Thu May 5 18:39:04 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Thu, 5 May 2011 10:39:04 -0600 Subject: [R] Using error in histograms In-Reply-To: References: Message-ID: The logspline package does density estimation in a different way than KDE, but it does allow for interval censored data (I know this value is between a and b, but not where in that range) using the oldlogspline function. This may be what you are looking for. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Kristof Ostir > Sent: Thursday, May 05, 2011 4:16 AM > To: r-help at r-project.org > Subject: [R] Using error in histograms > > Hello! > > I am trying to produce a histogram of measurement data (orientation of > archaeological structures) that are a subject to measurement error. > The normal histogram just computes frequencies, but does not take into > account that a particular value is spread over a range of values (in > my case the spread is different for reach measurement and is larger > than the bin size). > > The closest approach is kernel density estimation (in the image is a > comparison of histogram and KDE): > http://en.wikipedia.org/wiki/Kernel_density_estimation > http://en.wikipedia.org/wiki/File:Comparison_of_1D_histogram_and_KDE.pn > g > However in my case the kernel size is different for each value. I > wrote a program in IDL that performs the plotting, but am just > wondering if such a function is available in R. Basically it is a > problem of summing (and later plotting) several data distributions. > > I would appreciate also any hint to a book that might be dealing with > the problem. I am not an expert in statistics and I might not be using > use the correct terminology in web searches. > > Regards, > > Kristof > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From russ.abbott at gmail.com Thu May 5 18:42:00 2011 From: russ.abbott at gmail.com (Russ Abbott) Date: Thu, 5 May 2011 09:42:00 -0700 Subject: [R] quantmod's addTA plotting functions In-Reply-To: <4DC27E7C.6050204@ucalgary.ca> References: <4DC27E7C.6050204@ucalgary.ca> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bhh at xs4all.nl Thu May 5 18:43:11 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Thu, 5 May 2011 09:43:11 -0700 (PDT) Subject: [R] Compiling a FORTRAN program under Windows 7 In-Reply-To: References: Message-ID: <1304613791035-3498839.post@n4.nabble.com> Mikael Anderson wrote: > > Hi, > > I am trying to compile a FORTRAN program to call from R under Windows 7 > but > I am having problem in the compiling step. To demonstrate this is the > program testit.f: > > ------------------------------------------ > subroutine TESTIT(x,n,m) > dimension x(n) > do 10 i=1,n > 10 x(i)=x(i)**m > end > -------------------------------------------- > > In addition to the previous remarks, you would do yourself a favour by utilizing the implicit-none option of gfortran. That will force you to declare every variable (and its type) which will avoid many nasty bugs. In your case: x is a REAL single precision but with R you will preferably need double precision. So I would advise you to use gfortran -fimplicit-none ...... Berend -- View this message in context: http://r.789695.n4.nabble.com/Compiling-a-FORTRAN-program-under-Windows-7-tp3498663p3498839.html Sent from the R help mailing list archive at Nabble.com. From clint at ecy.wa.gov Thu May 5 18:43:55 2011 From: clint at ecy.wa.gov (Clint Bowman) Date: Thu, 5 May 2011 09:43:55 -0700 (PDT) Subject: [R] Null In-Reply-To: References: <1304599725057-3498261.post@n4.nabble.com> Message-ID: with(fcv,shapiro.test(case)) -- Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Thu, 5 May 2011, Thomas Levine wrote: > Maybe you were doing something like > > fcv <- read.csv('fcv.csv') > > instead of > > fcv <- read.csv('fcv.csv')[1] > > (I haven't tested this.) > > Tom > > On Thu, May 5, 2011 at 8:48 AM, pcc wrote: >> >> This is probably a very simple question but I am completely stumped!I am >> trying to do shapiro.wilk(x) test on a relatively small dataset(75) and each >> time my variable and keeps coming out as 'NULL', and >> >>> shapiro.test(fcv) >> Error in complete.cases(x) : no input has determined the number of cases >> >> my text file looks like this: >> >> case >> 1.600972896 >> 1.534026106 >> 1.633468456 >> 1.69019608 >> 1.686636269 >> 1.713490543 >> 1.460897843 >> 1.604226053 >> 1.547774705 >> 1.575187845 >> 1.50242712 >> 1.489958479 >> 1.555094449 >> 1.56937391 >> 1.46686762 >> 1.583198774 >> 1.59439255 >> 1.627365857 >> 1.596597096 >> 1.598790507 >> 1.596597096 >> 1.613841822 >> 1.607455023 >> 1.586587305 >> 1.72427587 >> 1.668385917 >> 1.743509765 >> 1.5774918 >> 1.709269961 >> 1.507855872 >> 1.650307523 >> 1.670245853 >> 1.721810615 >> 1.613841822 >> 1.586587305 >> 1.658011397 >> 1.595496222 >> 1.662757832 >> 1.521138084 >> 1.564666064 >> 1.515873844 >> 1.596597096 >> 1.617000341 >> 1.621176282 >> 1.598790507 >> 1.73479983 >> 1.498310554 >> 1.571708832 >> 1.426511261 >> 1.698970004 >> 1.534026106 >> 1.5774918 >> 1.682145076 >> 1.689308859 >> 1.654176542 >> 1.526339277 >> 1.545307116 >> 1.658964843 >> 1.638489257 >> 1.557507202 >> 1.604226053 >> 1.627365857 >> 1.651278014 >> 1.627365857 >> 1.559906625 >> 1.720159303 >> 1.64738297 >> 1.62324929 >> 1.698970004 >> 1.704150517 >> 1.57863921 >> 1.558708571 >> 1.681241237 >> 1.539076099 >> 1.5132176 >> >> Any ideas? >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Null-tp3498261p3498261.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Greg.Snow at imail.org Thu May 5 18:49:51 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Thu, 5 May 2011 10:49:51 -0600 Subject: [R] Insert values to histogram In-Reply-To: <1304596216055-3498140.post@n4.nabble.com> References: <1304596216055-3498140.post@n4.nabble.com> Message-ID: Are you really sure that you want to do that? Read the discussion starting with this post: http://tolstoy.newcastle.edu.au/R/e2/help/07/08/22858.html for reasons why you probably don't (yes, the question is about bar plots not histograms, but much of it will still apply). Near the end of the discussion there are examples of alternatives and some ways to add that may apply to your question if you still feel the need. Part of the answer depends on how you are creating your histogram in the first place, 3 different functions pop to my mind that create histograms (and I am sure there are plenty more if I looked), how to add numbers in each case would be very different, so any help we offered (beyond generalities mentioned above) could be more misleading than helpful without that information. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of matibie > Sent: Thursday, May 05, 2011 5:50 AM > To: r-help at r-project.org > Subject: [R] Insert values to histogram > > I'm trying to add the exact value on top of each column of an > Histogram, i > have been trying with the text function but it doesn't work. > The problem is that the program it self decides the exact value to give > to > each column, and ther is not like in a bar-plot that I know exactly > which > values are been plotting. > If anyone have any new idea on how to do this > Thanks > Matias > > -- > View this message in context: http://r.789695.n4.nabble.com/Insert- > values-to-histogram-tp3498140p3498140.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From jeff.a.ryan at gmail.com Thu May 5 18:55:02 2011 From: jeff.a.ryan at gmail.com (Jeff Ryan) Date: Thu, 5 May 2011 11:55:02 -0500 Subject: [R] quantmod's addTA plotting functions In-Reply-To: References: <4DC27E7C.6050204@ucalgary.ca> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Thu May 5 19:07:56 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Thu, 5 May 2011 10:07:56 -0700 Subject: [R] Boxplot in order In-Reply-To: <4238836B2F8D4598BC9BFD477324F83D@ccePC> References: <4238836B2F8D4598BC9BFD477324F83D@ccePC> Message-ID: Hi: Try this: karla = data.frame( Groups = factor(rep(c('CPre','SPre','C7','S7','C14','S14','C21','S21'), 11), levels = c('CPre','SPre','C7','S7','C14','S14','C21','S21')), Time = rep(c(0,7,14,21), 11), Resp = rnorm(88) ) boxplot(Resp~Groups, data = karla) Since you didn't have a variable valor defined, I substituted in random normal deviates. HTH, Dennis On Thu, May 5, 2011 at 7:41 AM, Silvano wrote: > Hi, > > I need construct box plot graph, but I want keep Groups order > > karla = data.frame( > Groups = factor(rep(c('CPre','SPre','C7','S7','C14','S14','C21','S21'), > 11)), > Time = rep(c(0,7,14,21), 11), > Resp = valor > ) > > boxplot(Resp~Groups, order=T) > > doesn't work. > > How do this? > > -------------------------------------- > Silvano Cesar da Costa > Departamento de Estat?stica > Universidade Estadual de Londrina > Fone: 3371-4346 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From djsloan at liv.ac.uk Thu May 5 19:08:21 2011 From: djsloan at liv.ac.uk (dereksloan) Date: Thu, 5 May 2011 10:08:21 -0700 (PDT) Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <9436384F-B91B-49AF-AFBF-395690C537DE@comcast.net> References: <1304590606704-3498006.post@n4.nabble.com> <1304604084040-3498427.post@n4.nabble.com> <9436384F-B91B-49AF-AFBF-395690C537DE@comcast.net> Message-ID: <1304615301253-3498896.post@n4.nabble.com> Thanks David, I did notice that and I got his code to work using wilcox.test for the continuous variables. The problem is that when I tried to alter the code to do chisq.test on my categorical variables there is something wrong with the syntax and I don't know what. Derek -- View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498896.html Sent from the R help mailing list archive at Nabble.com. From Greg.Snow at imail.org Thu May 5 19:08:25 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Thu, 5 May 2011 11:08:25 -0600 Subject: [R] perspective plot In-Reply-To: <1304611764230-3498754.post@n4.nabble.com> References: <1304611764230-3498754.post@n4.nabble.com> Message-ID: The persp function expects z to be a matrix, so you could reshape your data so that z is a matrix (the reshape function or package may help). Or the wireframe function in the lattice package expects data more like what you show, that may be the easiest solution. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Mauro > Sent: Thursday, May 05, 2011 10:09 AM > To: r-help at r-project.org > Subject: [R] perspective plot > > Hello, > > I`m tryint to plot some variable over x and y coordinate, but can`t > figure > out hot to do it ( I could do a simple scatterplot3d, but there I can`t > change the viewing angle). > > My dataset looks something like this (dataframe): > > X Y Q95 > 21 2628711 1104437 0.7723994 > 22 2628721 1104437 0.5961789 > 23 2628731 1104437 1.2013182 > 24 2628741 1104437 1.3468632 > 25 2628751 1104437 1.1035517 > 26 2628761 1104437 1.0528809 > > Is there a way to plot Q95 over x and y? Something like persp(x,y,z) > would > be nice, but can`t figure out the right input format for this function. > > > Thanks a lot, > > Mauro > > -- > View this message in context: > http://r.789695.n4.nabble.com/perspective-plot-tp3498754p3498754.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From ehlers at ucalgary.ca Thu May 5 19:13:24 2011 From: ehlers at ucalgary.ca (P Ehlers) Date: Thu, 05 May 2011 11:13:24 -0600 Subject: [R] quantmod's addTA plotting functions In-Reply-To: References: <4DC27E7C.6050204@ucalgary.ca> Message-ID: <4DC2DAB4.2090203@ucalgary.ca> Russ, All you have to do is replace addTA(GSPC.EMA.3, on = 1, col = "#0000ff") with plot(addTA(GSPC.EMA.3, on = 1, col = "#0000ff")) etc. I can sympathize with the documentation frustration, but I think that much of the documentation in R and in many R packages is actually very good. I get much more frustrated with the attempts at 'non-technical' explanations I find in other software. It does take a bit of getting used to always looking at the Value section and, if in doubt, checking some of the See Alsos, but it's worth it. I don't know quantmod very well, but even a cursory look at the pdf file shows that the docs are quite good. As Jeff points out, good documentation is not easy. More good examples are always better, but that's mighty time-consuming. Peter Ehlers On 2011-05-05 10:42, Russ Abbott wrote: > Thanks. You're right. I didn't see that. I read the ?addTA help page, > which (annoyingly) didn't mention that feature, but I didn't read the > ?TA page. (That page was mentioned as a see also, but not as a must see.) > > I don't know what it means to wrap these calls in a plot call. I tried > to put the addTA calls into a function and call that function from the > higher level function, but that didn't work either. Would you tell me > what it means to wrap these calls in a plot call. > > Thanks > /-- Russ / > > P.S. Pardon my irritation, but I continually find that many of the help > files assume one already knows the information one is looking for. If > you don't know it, the help files are not very helpful. This is a good > example. In fact, it's two good examples. I didn't know that I had to > look at another page, and I (still) don't know what it means to wrap > plot calls in another plot call. > > > On Thu, May 5, 2011 at 3:39 AM, P Ehlers > wrote: > > On 2011-05-05 0:47, Russ Abbott wrote: > > Hi, > > I'm having trouble with quantmod's addTA plotting functions. > They seem to > work fine when run from the command line. But when run inside a > function, > only the last one run is visible. Here's an example. > > > test.addTA<- function(from = "2010-06-01") { > getSymbols("^GSPC", from = from) > GSPC.close<- GSPC[,"GSPC.Close"] > GSPC.EMA.3<- EMA(GSPC.close, n=3, ratio=NULL) > GSPC.EMA.10<- EMA(GSPC.close, n=10, ratio=NULL) > chartSeries(GSPC.close, theme=chartTheme('white'), > up.col="black", > dn.col="black") > addTA(GSPC.EMA.3, on = 1, col = "#0000ff") > addTA(GSPC.EMA.10, on = 1, col = "#ff0000") > # browser() > } > > > When I run this, GSPC.close always appears. But only GSPC.EMA10 > appears on > the plot along with it. If I switch the order of the addTA calls, > only GSPC.EMA3 appears. If I uncomment the call to browser() > neither appears > when the browser() interrupt occurs. I can then draw both > GSPC.EMA.3 and > GSPC.EMA10 manually, and let the function terminate. All > intended plots are > visible after the function terminates. So it isn't as if one > wipes out the > other. This shows that it's possible to get all three lines on > the plot, but > I can't figure out how to do it without manual intervention. Any > suggestions > are appreciated. > > > Perhaps you didn't see this NOTE on the ?TA help page: > > "Calling any of the above methods from within a function > or script will generally require them to be wrapped in a > plot call as they rely on the context of the call to > initiate the actual charting addition." > > Peter Ehlers > > > Thanks. > > *-- Russ * > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From jeff.a.ryan at gmail.com Thu May 5 19:04:34 2011 From: jeff.a.ryan at gmail.com (Jeff Ryan) Date: Thu, 5 May 2011 12:04:34 -0500 Subject: [R] quantmod's addTA plotting functions In-Reply-To: References: <4DC27E7C.6050204@ucalgary.ca> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Thu May 5 19:33:47 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 13:33:47 -0400 Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304615301253-3498896.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> <1304604084040-3498427.post@n4.nabble.com> <9436384F-B91B-49AF-AFBF-395690C537DE@comcast.net> <1304615301253-3498896.post@n4.nabble.com> Message-ID: <6495E332-571E-4AA4-970F-C5CC5D3B6EEC@comcast.net> On May 5, 2011, at 1:08 PM, dereksloan wrote: > Thanks David, > > I did notice that and I got his code to work using wilcox.test for the > continuous variables. > > The problem is that when I tried to alter the code to do chisq.test > on my > categorical variables there is something wrong with the syntax and I > don't > know what. Right.... > ?chisq.test # No mention of a formula argument seen > ?chisq.test.formula No documentation for 'chisq.test.formula' in specified packages and libraries: you could try '??chisq.test.formula' `chisq.test` doesn't have a formula method, so sending it a formula will fail. Why aren't you sending it the arguments instead of turning them into strings? > > Derek > > -- > View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3498896.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From djsloan at liv.ac.uk Thu May 5 19:45:01 2011 From: djsloan at liv.ac.uk (dereksloan) Date: Thu, 5 May 2011 10:45:01 -0700 (PDT) Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <682a4065-bf4b-4f7a-ae27-1cc372a92f8d@35g2000prp.googlegroups.com> References: <1304590606704-3498006.post@n4.nabble.com> <682a4065-bf4b-4f7a-ae27-1cc372a92f8d@35g2000prp.googlegroups.com> Message-ID: <1304617501497-3499001.post@n4.nabble.com> Thanks a lot, I understand what you say but I'm having problems - maybe with the syntax or the specific command. You are right - I have a dataframe to store the data and want to automate the analysis. i.e. I want do a chisq.test with to know if alcohol intake (Y/N) differs between sexes, then if smoking (Y/N) differs between sexes, then if alcohol intake or smoking differ by hiv status. The command within my data frame for each individual comparison is e.g. chisq.test(alcohol,sex)... then repeat it for all combination of variables. but using lapply I'm still unsure how to design the loop. I'll keep trying - let me know if you have more ideas. Derek -- View this message in context: http://r.789695.n4.nabble.com/Using-functions-loops-for-repetitive-commands-tp3498006p3499001.html Sent from the R help mailing list archive at Nabble.com. From ananta.acharya at gmail.com Thu May 5 19:06:34 2011 From: ananta.acharya at gmail.com (antu) Date: Thu, 5 May 2011 10:06:34 -0700 (PDT) Subject: [R] distance matrix In-Reply-To: References: <1304591665763-3498033.post@n4.nabble.com> Message-ID: <1304615194140-3498890.post@n4.nabble.com> I don't know whether I understood your question, but 1.1, 1.2 , 1.3 all are subsample of 1 , so, rather than comparing 1000 subsample, comparison of 20 pop level makes more sense in my case. thanks for query ----- Ananta Acharya Graduate Student -- View this message in context: http://r.789695.n4.nabble.com/distance-matrix-tp3498033p3498890.html Sent from the R help mailing list archive at Nabble.com. From dwinsemius at comcast.net Thu May 5 19:45:19 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 13:45:19 -0400 Subject: [R] cross-correlation table with subscript or superscript to indicate significant differences In-Reply-To: References: Message-ID: <546F77F1-C50E-431E-B2C1-B80A6FEE7399@comcast.net> On May 5, 2011, at 12:40 PM, yoav baranan wrote: > Here is an example for my earlier question. > > Say you have a 3x3 correlation matrix: > corrs <- matrix(c(0.25,0.32,0.66,0.14,0.24,0.34,0.44,0.34,0.11), > nrow=3, ncol=3, dimnames = list(c('varA','varB', 'varC'), > c('varA','varB', 'varC'))) > And another matrix for the sample size of each correlation: > sizes <- matrix(c(44,68,313,142,144,207,201,100,99), nrow=3, ncol=3, > dimnames = list(c('varA','varB', 'varC'), c('varA','varB', 'varC'))) > > corrs looks like this: > varA varB varC > varA 0.05 0.14 0.44 > varB 0.32 0.24 0.34 > varC 0.66 0.57 0.50 > > sizes: > varA varB varC > varA 44 142 201 > varB 68 144 100 > varC 313 207 99 > > i.e., the correlation between variables A and C was 0.66 with sample > size of 313. (I got these tables from rcorr). Why not offer the result of dput() on the result from rcorrs cone on 3 variables. Then we should have the necessary building blocks for your original request. This way we do not have the matrix of p-values. And as the Posting Guide clearly says ...Please post in plain text. In your case it is particularly annoying because I do not get the filtered version but rather you html version and the text is almost unreadable at a font size of 10 in whatever font it is specifying! > > What I want to do is to compare the correlations in each row > (probably using r.test), and then create a correlation table with > subscripts or superscripts indicating the significance > "group" (again: correlations with different superscripts, in the > same row, are significantly different from each other). > Something like this: > varA varB varC > varA 0.05b 0.14b 0.44a > varB 0.32a 0.24a 0.34a > varC 0.66a 0.57ab 0.50b At the moment we have no way of determining what values should be post- pended with which letters. > > Of course, I don't have a 3x3 table. I have about 20 tables of at > least 7x7 each, so this is why I'm looking for methods to automate > the process. > > Thanks, > Yoav > > > CC: r-help at r-project.org > > From: dwinsemius at comcast.net > > To: ybaranan at hotmail.com > > Subject: Re: [R] cross-correlation table with subscript or > superscript to indicate significant differences > > Date: Thu, 5 May 2011 12:17:25 -0400 > > > > > > On May 5, 2011, at 10:48 AM, yoav baranan wrote: > > > > > > > > Hi, I wonder whether the following is possible with R, and whether > > > anyone has done that and can share his/her code with me. I have a > > > correlation matrix, and I want to create a correlation table > that I > > > can copy to Microsoft Word with a superscript above each > > > correlation, indicating significant differences in the same row. > > > That is, when correlations in the same row do not share > superscript, > > > it means that they are significantly different from each other. > > > thanks,yoav > > > [[alternative HTML version deleted]] > > > > An example with data and the desired result might help focus the > > discussion. > > > > This shows how to set up an example showing how extract the row > > numbers from a correlation matrix with absolute values above 0.5 but > > less than 1 (to exclude the trivial cases). > > snipped > > David Winsemius, MD West Hartford, CT From gleynes at gmail.com Thu May 5 19:08:18 2011 From: gleynes at gmail.com (Gene Leynes) Date: Thu, 5 May 2011 12:08:18 -0500 Subject: [R] Using $ accessor in GAM formula Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ybaranan at hotmail.com Thu May 5 18:40:17 2011 From: ybaranan at hotmail.com (yoav baranan) Date: Thu, 5 May 2011 19:40:17 +0300 Subject: [R] cross-correlation table with subscript or superscript to indicate significant differences Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Thu May 5 20:01:33 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 14:01:33 -0400 Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304617501497-3499001.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> <682a4065-bf4b-4f7a-ae27-1cc372a92f8d@35g2000prp.googlegroups.com> <1304617501497-3499001.post@n4.nabble.com> Message-ID: <8618B6A2-B45F-4548-B9C0-2102CF6CB7B4@comcast.net> On May 5, 2011, at 1:45 PM, dereksloan wrote: > Thanks a lot, > > I understand what you say but I'm having problems - maybe with the > syntax or > the specific command. > > You are right - I have a dataframe to store the data and want to > automate > the analysis. > > i.e. I want do a chisq.test with to know if alcohol intake (Y/N) > differs > between sexes, then if smoking (Y/N) differs between sexes, then if > alcohol > intake or smoking differ by hiv status. > > The command within my data frame for each individual comparison is > e.g. > > chisq.test(alcohol,sex)... then repeat it for all combination of > variables. I don't generally answer questions that support shotgun approaches to manufacturing p-values for fear of encouraging unprincipled data- ming ... unless it is clear that the questioner understands what he are doing from a statistical point of view. So my apologies. I probably shouldn't have even posted in this case. I misunderstood the question and thought it was just a quick syntactic fix. I now understand it to be more involved and really demands more care and respect than I was giving it. > > but using lapply I'm still unsure how to design the loop. > > I'll keep trying - let me know if you have more ideas. > > Derek -- David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Thu May 5 20:06:19 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 14:06:19 -0400 Subject: [R] Using $ accessor in GAM formula In-Reply-To: References: Message-ID: <6A4060E3-3093-48A2-A83E-9E66E57530DE@comcast.net> On May 5, 2011, at 1:08 PM, Gene Leynes wrote: > This is not mission critical, but it's bothering me. I'm getting > inconsistent results when I use the $ accessor in the gam formula > > *In window #1:* >> library(mgcv) >> dat=data.frame(x=1:100,y=sin(1:100/50)+rnorm(100,0,.05)) >> str(dat) >> gam(dat$y~s(dat$x)) > Error in eval(expr, envir, enclos) : object 'x' not found I get the same error, but using the standard R approach to passing data and formulae to regression functions I have not difficulty: > gam(y~s(x), data=dat) Family: gaussian Link function: identity Formula: y ~ s(x) Estimated degrees of freedom: 4.1552 total = 5.155229 GCV score: 0.002242771 >> > > *In window #2:* >> gm = gam(dat$cf~s(dat$s)) >> gm > > Family: gaussian > Link function: identity > > Formula: > dat$cf ~ s(dat$s) > > Estimated degrees of freedom: > 8.7757 total = 9.77568980091 > > GCV score: 302.551417213 > > > > > Has anyone else seen the same thing? > > In both cases I'm using Windows 7 and R 2.13.0 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From projectbasu at gmail.com Thu May 5 20:06:30 2011 From: projectbasu at gmail.com (swaraj basu) Date: Thu, 5 May 2011 20:06:30 +0200 Subject: [R] R CMD check warning Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From asanramzan at yahoo.com Thu May 5 20:31:06 2011 From: asanramzan at yahoo.com (Asan Ramzan) Date: Thu, 5 May 2011 11:31:06 -0700 (PDT) Subject: [R] ANOVA Message-ID: <130635.1126.qm@web44709.mail.sp1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From S.Ellison at lgc.co.uk Thu May 5 20:55:03 2011 From: S.Ellison at lgc.co.uk (S Ellison) Date: Thu, 05 May 2011 19:55:03 +0100 Subject: [R] ANOVA 1 too few degrees of freedom Message-ID: >>> Rovinpiper 04/05/2011 22:43 >>> >So this seems to indicate that I have what I want. I have two >respiration data points at each plot on each day. Yes; if you had only Plot+Day you'd have a completely balanced full factorial ... for Plot and Day. But I think I now see an answer to your puzzlement. Your original ANOVA table was Df Sum Sq Mean Sq F value Pr(>F) Combined.Trt 1 52.80 52.805 2.0186e+30 < 2.2e-16 *** as.factor(Combined.Plot) 10 677.69 67.769 2.5907e+30 < 2.2e-16 *** as.factor(Combined.Day) 16 2817.47 176.092 6.7317e+30 < 2.2e-16 *** Combined.Trt:as.factor(Combined.Day) 16 47.82 2.989 1.1426e+29 < 2.2e-16 *** as.factor(Combined.Day):Combined.Plot 160 611.21 3.820 1.4604e+29 < 2.2e-16 *** Residuals 204 0.00 0.000 Now, your Day:Plot combination has 17*12=204 combinations. And you are showing 204 residual DOF, which is exactly right for two observations per group and 204 groups. That's exactly right for the model Day+Plot+Day:Plot. But you have Trt in the model as well. Where are the Treatment DoF coming from? Clearly, the treatment DoF has not come out of the residual term. That means the treatment DoF must have come from somewhere else. Since you've lost 2 instead of 1 DoF from Plot and kept your n[Day]-1 degrees of freedom for Day, I would guess that Plot is nested in Trt. If that is the case, you'd 'obviously' expect n[Plot]-2 dof for Plot. And indeed i could replicate your ANOVA table DoF with Trt<-gl(2,204) Plot<-gl(12,34) #Note that Plot is NOT fully crossed with Trt; each Trt has only 6 Plots Day <- gl(17,2,408) y<-rnorm(408) anova(lm(y~Trt+Plot+Day+Trt:Day+Day:Plot)) So that would be one data structure that would explain your DoF. But if that is indeed the structure, there are other concerns that you may need to look out for. First, compare the two models anova(lm(y~Trt+Plot+Day+Trt:Day+Day:Plot)) #and anova(lm(y~Trt+Plot+Day+Day:Plot+Trt:Day)) Same terms in the model specification, but different DoF and Trt:Day has completely vanished from the second ANOVA table. Something is clearly either aliased, unbalanced or incorrectly specified. In my presumed version of events, because the Plot is nested in Trt, Day:Plot is also nested in Trt. To get _consistent_ results independent of model order you would need to reflect that additional nesting in the model: anova(lm(y~Trt+Plot+Day+Trt:Day:Plot+Trt:Day)) #vs anova(lm(y~Trt+Plot+Day+Trt:Day+Trt:Day:Plot)) and this time, the two models give identical results for all rows in the table. Somewhat reassuringly anova(lm(y~Day+Trt/Plot+Trt:Day+Trt:Day:Plot)) and even more reassuringly car's Anova does too. But there's another more fundamental thing that may suggest this is still not quite the right thing to do. If Plot is nested in Trt, it matters whether plot is random or fixed when asking about the treatment effect. I'd guess Plot is random. If plot is random and nested in Trt it would not be appropriate to simply compare any calculated Trt MS with the residual MS. Rather, one would compare the Trt MS with the Trt:Plot interaction. Or perhaps resort to a mixed effects model using lme (if Plot is the only random term) or lmer if Plot and Day are both random. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} From dwinsemius at comcast.net Thu May 5 20:58:46 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 14:58:46 -0400 Subject: [R] ANOVA In-Reply-To: <130635.1126.qm@web44709.mail.sp1.yahoo.com> References: <130635.1126.qm@web44709.mail.sp1.yahoo.com> Message-ID: On May 5, 2011, at 2:31 PM, Asan Ramzan wrote: > Hello R-Help > > How can i exctact and store the "within group mean squared > difference" from an > anova summary table into a varible. In the absence of an example and code it's speculation, but something along the lines of: anova(fit)$`Mean Sq` ## might give you something to work with. > [[alternative HTML version deleted]] -- David Winsemius, MD West Hartford, CT From ehlers at ucalgary.ca Thu May 5 21:05:42 2011 From: ehlers at ucalgary.ca (P Ehlers) Date: Thu, 05 May 2011 13:05:42 -0600 Subject: [R] Using $ accessor in GAM formula In-Reply-To: <6A4060E3-3093-48A2-A83E-9E66E57530DE@comcast.net> References: <6A4060E3-3093-48A2-A83E-9E66E57530DE@comcast.net> Message-ID: <4DC2F506.4090306@ucalgary.ca> Gene, David has given you the preferred code. I just want to point out that the $-accessor is often not the best thing to use. Both dat[["y"]] and dat[, "y"] will work just fine. Peter Ehlers On 2011-05-05 12:06, David Winsemius wrote: > > On May 5, 2011, at 1:08 PM, Gene Leynes wrote: > >> This is not mission critical, but it's bothering me. I'm getting >> inconsistent results when I use the $ accessor in the gam formula >> >> *In window #1:* >>> library(mgcv) >>> dat=data.frame(x=1:100,y=sin(1:100/50)+rnorm(100,0,.05)) >>> str(dat) >>> gam(dat$y~s(dat$x)) >> Error in eval(expr, envir, enclos) : object 'x' not found > > I get the same error, but using the standard R approach to passing > data and formulae to regression functions I have not difficulty: > > > gam(y~s(x), data=dat) > > Family: gaussian > Link function: identity > > Formula: > y ~ s(x) > > Estimated degrees of freedom: > 4.1552 total = 5.155229 > > GCV score: 0.002242771 > >>> >> >> *In window #2:* >>> gm = gam(dat$cf~s(dat$s)) >>> gm >> >> Family: gaussian >> Link function: identity >> >> Formula: >> dat$cf ~ s(dat$s) >> >> Estimated degrees of freedom: >> 8.7757 total = 9.77568980091 >> >> GCV score: 302.551417213 >> >> >> >> >> Has anyone else seen the same thing? >> >> In both cases I'm using Windows 7 and R 2.13.0 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jabbba at gmail.com Thu May 5 21:15:37 2011 From: jabbba at gmail.com (Marco =?UTF-8?B?QmFyYsOgcmE=?=) Date: Thu, 5 May 2011 21:15:37 +0200 Subject: [R] help with a survplot In-Reply-To: <1304346610371-3490126.post@n4.nabble.com> References: <20110430164400.531dfe9a@caprica> <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> <20110502104105.1660c591@caprica> <1304346610371-3490126.post@n4.nabble.com> Message-ID: <20110505211537.6047fa4e@caprica> Il giorno Mon, 2 May 2011 07:30:10 -0700 (PDT) Frank Harrell ha scritto: > Please elaborate. It is simply that, generally speaking, i don't like adding numbers to a plot. I eventually realized that riskset cardinality may be a useful indication. However, do not discriminate between events and censored observations could generate some confusion, i think. Anyway, thanks to your software i was able to produce what i was requested for in very little time, so thank you very much. From Roger.Bivand at nhh.no Thu May 5 21:01:16 2011 From: Roger.Bivand at nhh.no (Roger Bivand) Date: Thu, 5 May 2011 12:01:16 -0700 (PDT) Subject: [R] Instrumental variable quantile estimation of spatial autoregressive models In-Reply-To: <99D322B6F7951F468D4788B2CB077C7B0BD10162@trip.uni.lux> References: <99D322B6F7951F468D4788B2CB077C7B0BD10162@trip.uni.lux> Message-ID: <1304622076830-3499190.post@n4.nabble.com> This is a very detailed question, and possibly ought to have been posted on R-sig-geo. Please re-post there, and see if anyone steps forward. As far as I am aware, no implementation exists. I suggest you also refer to Kostov (2009) in Spatial Economic Analysis volume 4. Roger Marie-Line Glaesener wrote: > > Dear all, > > I would like to implement a spatial quantile regression using instrumental > variable estimation (according to Su and Yang (2007), Instrumental > variable quantile estimation of spatial autoregressive models, SMU > economics & statistis working paper series, 2007, 05-2007, p.35 ). > > I am applying the hedonic pricing method on land transactions in > Luxembourg. My original data set contains 4335 observations. > I'm quite new to R and would like to ask if someone has implemented the > method proposed by Su and Yang in R or if anyone could give me a hint on > the different codes and steps? > Please find attached a small sample of my data and matrix. > R codes: > > library(foreign) > library(lmtest) > library(spdep) > library(quantreg) > > data<-read.table("DataSample.txt",header=TRUE, sep="") > attach(data) > > matrix<-read.gwt2nb("matrixsample.gwt" ,region.id=no_Trans) > matrix.listw<-nb2listw(matrix) > > OLS model > OLS<-lm(lnprice~surface+d2007+LUX+tsect_ci, data=data) > summary(OLS) > > SAR model > SAR<-lagsarlm(lnprice~surface+d2007+LUX+tsect_ci, data=data, listw = > matrix.listw) > summary(SAR) > > I hope that this information is sufficient and will help you to help me :) > > Many thanks in advance, > > Marie-Line Glaesener > > PhD student > Unit? de Recherche IPSE (Identit?s. Politiques, Soci?t?s, Espaces) > Laboratoire de G?ographie et Am?nagement du Territoire > > UNIVERSIT? DU LUXEMBOURG > CAMPUS WALFERDANGE > Route de Diekirch / BP 2 > L-7201 Walferdange > Luxembourg > www.geo.ipse.uni.lu<http://www.geo.ipse.uni.lu/> > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Roger Bivand Economic Geography Section Department of Economics Norwegian School of Economics and Business Administration Helleveien 30 N-5045 Bergen, Norway -- View this message in context: http://r.789695.n4.nabble.com/Instrumental-variable-quantile-estimation-of-spatial-autoregressive-models-tp3496637p3499190.html Sent from the R help mailing list archive at Nabble.com. From Ray.Brownrigg at ecs.vuw.ac.nz Thu May 5 21:20:21 2011 From: Ray.Brownrigg at ecs.vuw.ac.nz (Ray Brownrigg) Date: Fri, 06 May 2011 07:20:21 +1200 Subject: [R] R CMD check warning In-Reply-To: References: Message-ID: <4DC2F875.4050808@ecs.vuw.ac.nz> On 6/05/2011 6:06 a.m., swaraj basu wrote: > Dear All, > I am trying to build a package for a set of functions. I am > able to build the package and its working fine. When I check it with > R CMD check > > I get a following warning : no visible global function > definition for ?biocLite? > > I have used biocLite to load a user defined library from > within a function if that library is not pre-installed > > if(is.element(annotpkg, installed.packages()[,1]) == "FALSE"){ > source("http://www.bioconductor.org/biocLite.R") > biocLite(annotpkg) > library(annotpkg,character.only=TRUE) > } > > Should I ignore this error or there is a workaround for the > warning. My package is working fine though still I guess the warning has to > have significance. Please help in clarifying this > warning. > Your source() call generates the function biocLite() in such a way that R CMD check cannot know, so a workaround is to precede the source() statement with something like: biocLite <- function(x) 0 [I don't know if there is a 'best' way to do this.] In general Warnings are just that - they point you to a possible fault, but there is not enough information for the code to make a final determination. HTH Ray Brownrigg From mtmorgan at fhcrc.org Thu May 5 21:24:13 2011 From: mtmorgan at fhcrc.org (Martin Morgan) Date: Thu, 05 May 2011 12:24:13 -0700 Subject: [R] R CMD check warning In-Reply-To: References: Message-ID: <4DC2F95D.6000208@fhcrc.org> On 05/05/2011 11:06 AM, swaraj basu wrote: > Dear All, > I am trying to build a package for a set of functions. I am > able to build the package and its working fine. When I check it with > R CMD check > > I get a following warning : no visible global function > definition for ?biocLite? > > I have used biocLite to load a user defined library from > within a function if that library is not pre-installed > > if(is.element(annotpkg, installed.packages()[,1]) == "FALSE"){ > source("http://www.bioconductor.org/biocLite.R") > biocLite(annotpkg) > library(annotpkg,character.only=TRUE) > } > > Should I ignore this error or there is a workaround for the > warning. My package is working fine though still I guess the warning has to > have significance. Please help in clarifying this > warning. Better to ask on the Bioconductor list http://bioconductor.org/help/mailing-list/ where you'll hit an audience familiar with annotation packages. As a possible model, see affy::cdfFromBioC or annotate::getAnnMap Martin > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 From adele_thompson at cargill.com Thu May 5 22:04:04 2011 From: adele_thompson at cargill.com (Schatzi) Date: Thu, 5 May 2011 13:04:04 -0700 (PDT) Subject: [R] Averaging uneven measurements by time with uneven numbers of measurements Message-ID: <1304625844550-3499337.post@n4.nabble.com> I have a new device that takes measurements anywhere from every second, to every 15 minutes (depending on changes). The matrix has a date, time and Y column (Y is the measurement). For three days it is 25,000 rows. How do I average the measurements by every 30 minutes so my matrix is 48 rows per day? I have been working on this and cannot figure out a simple method. Any ideas? Thank you. ----- In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499337.html Sent from the R help mailing list archive at Nabble.com. From clint at ecy.wa.gov Thu May 5 22:14:17 2011 From: clint at ecy.wa.gov (Clint Bowman) Date: Thu, 5 May 2011 13:14:17 -0700 (PDT) Subject: [R] Averaging uneven measurements by time with uneven numbers of measurements In-Reply-To: <1304625844550-3499337.post@n4.nabble.com> References: <1304625844550-3499337.post@n4.nabble.com> Message-ID: I'd be tempted to do a robust fit (loess?) to the data with a relatively small span (I'm assuming that there are errors in the measurements and some degree of smoothing is acceptable) then predict the fit at a regular interval (e.g., every 30 minutes). -- Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Thu, 5 May 2011, Schatzi wrote: > I have a new device that takes measurements anywhere from every second, to > every 15 minutes (depending on changes). The matrix has a date, time and Y > column (Y is the measurement). For three days it is 25,000 rows. How do I > average the measurements by every 30 minutes so my matrix is 48 rows per > day? I have been working on this and cannot figure out a simple method. Any > ideas? Thank you. > > ----- > In theory, practice and theory are the same. In practice, they are not - Albert Einstein > -- > View this message in context: http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499337.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From adele_thompson at cargill.com Thu May 5 22:20:09 2011 From: adele_thompson at cargill.com (Schatzi) Date: Thu, 5 May 2011 13:20:09 -0700 (PDT) Subject: [R] Averaging uneven measurements by time with uneven numbers of measurements In-Reply-To: References: <1304625844550-3499337.post@n4.nabble.com> Message-ID: <1304626809070-3499386.post@n4.nabble.com> I do not want smoothing as the data should have jumps (it is weight left in feeding bunker). I was thinking of maybe using a histogram-like function and then averaging that. Not sure if this is possible. ----- In theory, practice and theory are the same. In practice, they are not - Albert Einstein -- View this message in context: http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499386.html Sent from the R help mailing list archive at Nabble.com. From eruizvar at uwo.ca Thu May 5 21:47:02 2011 From: eruizvar at uwo.ca (Estefania Ruiz Vargas) Date: Thu, 05 May 2011 15:47:02 -0400 Subject: [R] problem with cor() using bigmemory Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From worikr at gmail.com Thu May 5 22:55:08 2011 From: worikr at gmail.com (Worik R) Date: Fri, 6 May 2011 08:55:08 +1200 Subject: [R] Looking for equivalent for "strstr" Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From clint at ecy.wa.gov Thu May 5 22:57:47 2011 From: clint at ecy.wa.gov (Clint Bowman) Date: Thu, 5 May 2011 13:57:47 -0700 (PDT) Subject: [R] Averaging uneven measurements by time with uneven numbers of measurements In-Reply-To: <1304626809070-3499386.post@n4.nabble.com> References: <1304625844550-3499337.post@n4.nabble.com> <1304626809070-3499386.post@n4.nabble.com> Message-ID: In your first request for help you said, "How do I average the measurements by every 30 minutes?" With 25000 readings over three days, it looks as if you are getting readings just about every second. Okay, why don't you use the first reading as your initial weight, w0. Then subtract each succeeding reading from that to obtain the amount of feed dispensed. Now plot that value every 30 minutes. If you are interested in the variation of feed dispensed over a half hour interval, that can be easily obtained by accummulating those half-hour readings. -- Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Thu, 5 May 2011, Schatzi wrote: > I do not want smoothing as the data should have jumps (it is weight left in > feeding bunker). I was thinking of maybe using a histogram-like function and > then averaging that. Not sure if this is possible. > > ----- > In theory, practice and theory are the same. In practice, they are not - Albert Einstein > -- > View this message in context: http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499386.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From ehlers at ucalgary.ca Thu May 5 22:57:45 2011 From: ehlers at ucalgary.ca (P Ehlers) Date: Thu, 05 May 2011 14:57:45 -0600 Subject: [R] Averaging uneven measurements by time with uneven numbers of measurements In-Reply-To: <1304626809070-3499386.post@n4.nabble.com> References: <1304625844550-3499337.post@n4.nabble.com> <1304626809070-3499386.post@n4.nabble.com> Message-ID: <4DC30F49.2040000@ucalgary.ca> On 2011-05-05 14:20, Schatzi wrote: > I do not want smoothing as the data should have jumps (it is weight left in > feeding bunker). I was thinking of maybe using a histogram-like function and > then averaging that. Not sure if this is possible. (It would be useful to include your original request - not everyone uses Nabble.) Actually, averaging *is* smoothing, but I suppose your intent is, for some reason, not to smooth across 30-minute boundaries. Perhaps you could use findInterval() to identify which measurements to average. Peter Ehlers > > ----- > In theory, practice and theory are the same. In practice, they are not - Albert Einstein > -- > View this message in context: http://r.789695.n4.nabble.com/Averaging-uneven-measurements-by-time-with-uneven-numbers-of-measurements-tp3499337p3499386.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Thu May 5 23:00:39 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 17:00:39 -0400 Subject: [R] Looking for equivalent for "strstr" In-Reply-To: References: Message-ID: On May 5, 2011, at 4:55 PM, Worik R wrote: > Friends > > This is an elementary question. Is there is a built in R function for > finding a sub-string in another string? Like strstr in C. > > I can easily roll my own, but if there is a built in that is one > less thing > I can do wrong! I have no acquaintance with that function, but perhaps: ?grep regexpr : "returns an integer vector of the same length as text giving the starting position of the first match or -1 if there is none, ..." > tst <- "abcde"; regexpr("bc", tst) [1] 2 attr(,"match.length") [1] 2 -- David Winsemius, MD West Hartford, CT From d_li at mit.edu Thu May 5 22:25:08 2011 From: d_li at mit.edu (Danielle Li) Date: Thu, 5 May 2011 16:25:08 -0400 Subject: [R] Looping over graphs in igraph Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From abhishek.vit at gmail.com Thu May 5 23:15:31 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 5 May 2011 14:15:31 -0700 Subject: [R] quick question : interpolating file name in pipe command Message-ID: Hi Guys I am trying to read a bunch of files in the loop but pipe function which I use to cut few columns is somehow unable to interpolate the file variable. eg: > file="check.txt" > data <- read.table(pipe("cut -f 2,3 file"), sep="\t", col.names=c('pos','cov') ) cut: file: No such file or directory how can I pass variable file to pipe so that it can be interpolated. Thanks! -Abhi From stevenkennedy2263 at gmail.com Thu May 5 23:20:33 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Fri, 6 May 2011 07:20:33 +1000 Subject: [R] Insert values to histogram In-Reply-To: <1304596216055-3498140.post@n4.nabble.com> References: <1304596216055-3498140.post@n4.nabble.com> Message-ID: Histograms plot data in bins - you don't get the exact value, because each bin contains a range of values. Do you want to plot the range of values the bin contains? Also, check ?hist to see how to set the values of the breaks between the bins. From david.j.meehan at gmail.com Thu May 5 23:30:42 2011 From: david.j.meehan at gmail.com (Rovinpiper) Date: Thu, 5 May 2011 14:30:42 -0700 (PDT) Subject: [R] ANOVA 1 too few degrees of freedom In-Reply-To: References: <1304451451151-3493349.post@n4.nabble.com> Message-ID: <1304631042962-3499649.post@n4.nabble.com> Thanks slre, I seem to be making some progress now. Using a colon instead of an asterisk in the code really changes things. I had been getting residual SS and MS of zero. Which is ridiculous. Now I get much more plausible values. Also, When I used an asterisk instead of a colon It wouldn't give results for three way interactions. With colons it will. You are correct about plot being nested within treatment. There are six plots in each of 2 treatments. So, I guess I will have to perform a separate analysis to quantify the effect of treatment. Thanks again. Analysis of Variance Table Response: Combined.Rs Df Sum Sq Mean Sq F value Pr(>F) Combined.Trt 1 52.80 52.805 96.2601 < 2.2e-16 *** Combined.Plot 10 677.69 67.769 123.5380 < 2.2e-16 *** as.factor(Combined.Day) 16 2817.47 176.092 321.0041 < 2.2e-16 *** Combined.Trt:as.factor(Combined.Day) 16 47.82 2.989 5.4487 4.048e-10 *** Combined.Trt:Combined.Plot:as.factor(Combined.Day)80 455.42 5.693 10.3776 < 2.2e-16 *** Residuals 284 155.79 0.549 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 -- View this message in context: http://r.789695.n4.nabble.com/ANOVA-1-too-few-degrees-of-freedom-tp3493349p3499649.html Sent from the R help mailing list archive at Nabble.com. From gbeyderman at proclivitysystems.com Thu May 5 23:34:53 2011 From: gbeyderman at proclivitysystems.com (Gamliel Beyderman) Date: Thu, 5 May 2011 17:34:53 -0400 Subject: [R] reading a column as a character vector Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From abhishek.vit at gmail.com Thu May 5 23:45:21 2011 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 5 May 2011 14:45:21 -0700 Subject: [R] quick question : interpolating file name in pipe command In-Reply-To: References: Message-ID: You can ignore my question I was able to figure out the way. I guess when I touch R after couple of weeks I am rusty. -Abhi On Thu, May 5, 2011 at 2:15 PM, Abhishek Pratap wrote: > Hi Guys > > I am trying to read a bunch of files in the loop but pipe function > which I use to cut few columns is somehow unable to interpolate the > file variable. > > > eg: > >> file="check.txt" >> data ?<- read.table(pipe("cut -f 2,3 file"), sep="\t", col.names=c('pos','cov') ) > cut: file: No such file or directory > > how can I pass variable file to pipe so that it can be interpolated. > > Thanks! > -Abhi > From dwinsemius at comcast.net Thu May 5 23:50:11 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 17:50:11 -0400 Subject: [R] quick question : interpolating file name in pipe command In-Reply-To: References: Message-ID: <6CFE0821-6DE1-47D1-A0B5-E382D6A3FD79@comcast.net> On May 5, 2011, at 5:15 PM, Abhishek Pratap wrote: > Hi Guys > > I am trying to read a bunch of files in the loop but pipe function > which I use to cut few columns is somehow unable to interpolate the > file variable. > > > eg: > >> file="check.txt" >> data <- read.table(pipe("cut -f 2,3 file"), sep="\t", >> col.names=c('pos','cov') ) I do not see where you allowed the substitution of the character vector file into the pipe argument. Perhaps: data <- read.table(pipe(paste("cut -f 2,3", file)), sep="\t", col.names=c('pos','cov') ) (Bad practice to name variables "file".) I didn't use tabs but rather spaces: check.txt was a single line file ttt tt rr ttt > data <- read.table(pipe(paste("cut -f 2,3", file)), col.names=c('pos','cov') ) > data pos cov 1 ttt tt 2 rr ttt > cut: file: No such file or directory > > how can I pass variable file to pipe so that it can be interpolated. > > Thanks! > -Abhi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From ggrothendieck at gmail.com Thu May 5 23:58:22 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Thu, 5 May 2011 17:58:22 -0400 Subject: [R] Looking for equivalent for "strstr" In-Reply-To: References: Message-ID: On Thu, May 5, 2011 at 4:55 PM, Worik R wrote: > Friends > > This is an elementary question. ?Is there is a built in R function for > finding a sub-string in another string? ?Like strstr in C. > > I can easily roll my own, but if there is a built in that is one less thing > I can do wrong! > Try ?substr and ?substring The character processing function help pages are listed here: help.search(keyword = "character", package = "base") -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From rolf.turner at xtra.co.nz Thu May 5 23:58:50 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Fri, 06 May 2011 09:58:50 +1200 Subject: [R] Using $ accessor in GAM formula In-Reply-To: <4DC2F506.4090306@ucalgary.ca> References: <6A4060E3-3093-48A2-A83E-9E66E57530DE@comcast.net> <4DC2F506.4090306@ucalgary.ca> Message-ID: <4DC31D9A.3020402@xtra.co.nz> On 06/05/11 07:05, P Ehlers wrote: > Gene, > > David has given you the preferred code. I just want to > point out that the $-accessor is often not the best > thing to use. Both dat[["y"]] and dat[, "y"] will work > just fine. Admittedly one should use the preferred code, i.e. gam(y ~ s(x),data=dat) and avoid all the hoo-hah, but it's strange that the dodgey code throws an error with gam(dat1$y ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s)) --- where dat1 and dat2 are identical except for their column names. Why are the column names having an impact upon finding the variables/objects in question? And it isn't just "x" and "y" that cause the error; I tried naming the columns "u" and "v" and the error was thrown in that case also. It seems that having that name "s" involved keeps the error from happening. It's probably not coincidental that that is the name of a *function* that gam() uses. Something a bit subtle is going on; it would be nice to be able to understand it. Just out of pure academic interest. :-) cheers, Rolf Turner From jonsleepy at gmail.com Fri May 6 00:28:47 2011 From: jonsleepy at gmail.com (J) Date: Thu, 5 May 2011 18:28:47 -0400 Subject: [R] factors Message-ID: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Fri May 6 00:37:43 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 18:37:43 -0400 Subject: [R] factors In-Reply-To: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> References: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> Message-ID: <83B81DBC-8024-44BA-A6CB-BE474795C9A7@comcast.net> On May 5, 2011, at 6:28 PM, J wrote: > Hi, I'm requesting you don't berate me for asking this question: > > I clearly don't have the gist of factors. > > I have two dataframes, A and B. > > Each of them has a column containing strings (they're labels). > > I want to, one-by-one in a loop, compare the particular string in an > entry from dataframe A to an entry in B, to see if they're the same. > > The problem, when posing the question: > searchID1 <- A[['label']][i] > possMatch1 <- B[['label']][j] > searchID1 == possMatch1 > > I get the error: > Error in Ops.factor(searchID1, possMatch1) : > level sets of factors are different > > I presume this is because the set of possible values in the 'labels' > columns, respectively in A and B, differ. In my case, I'm not > interested in this at all; I just want to compare individual entries > from the two dataframes in a pair-wise fashion. > > Can I strip the "factors" associated with the entries? Is there a > better way? Use as.character around the factors. Assuming they have equal nummbers of rows, then no loop is needed: searchID1 <- as.character(A[['label']]) possMatch1 <- as.character(B[['label']]) searchID1 == possMatch1 # returns pair-wise logical vector And if the number of rows are not equal, then you need to express your problem more clearly (and the dyadic function %in% is probably neededt) -- David Winsemius, MD West Hartford, CT From izahn at psych.rochester.edu Fri May 6 00:39:34 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Thu, 5 May 2011 18:39:34 -0400 Subject: [R] factors In-Reply-To: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> References: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> Message-ID: Hi Jonathan, On Thu, May 5, 2011 at 6:28 PM, J wrote: > Hi, I'm requesting you don't berate me for asking this question: > > I clearly don't have the gist of factors. > > I have two dataframes, A and B. > > Each of them has a column containing strings (they're labels). > > I want to, one-by-one in a loop, compare the particular string in an entry from dataframe A to an entry in B, to see if they're the same. This is probably not the best way. Much easier to compare them all in one go. > > The problem, when posing the question: > searchID1 <- A[['label']][i] > possMatch1 <- B[['label']][j] > searchID1 == possMatch1 > > I get the error: > Error in Ops.factor(searchID1, possMatch1) : > ?level sets of factors are different > > I presume this is because the set of possible values in the 'labels' columns, respectively in A and B, differ. ?In my case, I'm not interested in this at all; I just want to compare individual entries from the two dataframes in a pair-wise fashion. > > Can I strip the "factors" associated with the entries? ?Is there a better way? Yes, you can "strip the factors" using as.character(). Alternatively, you could use set the factor levels using factor() or levels. Either way, I would just do A$label == B$label Instead of a loop. Best, Ista > > Jonathan > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From Achim.Zeileis at uibk.ac.at Fri May 6 00:41:01 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Fri, 6 May 2011 00:41:01 +0200 (CEST) Subject: [R] hurdle, simulated power In-Reply-To: <4DC16D1A.4000904@u.washington.edu> References: <4DC16D1A.4000904@u.washington.edu> Message-ID: Dave: > We are planning an intervention study for adolescent alcohol use, and I > am planning to use simulations based on a hurdle model (using the > hurdle() function in package pscl) for sample size estimation. > > The simulation code and power code are below -- note that at the moment > the "power" code is just returning the coefficients, as something isn't > working quite right. Hmmm, strange. I didn't see the problem either. Some parameters did not seem to be passed correctly but I did not see it. But as I saw a few things that could be done more elegantly or in a way that they are easier to read (for me), I simply rewrote the code almost from scratch. For example, I now use plogis() for the inverse logit, rbinom() for drawing binomial random numbers, colMeans() for column means, and coeftest()/waldtest() for computing the p-values from the Wald tests. The code is included below. Both the simulation of the coefficients and the simulation of associated power values seem to yield plausible results. Hope that helps, Z dgp <- function(nobs = 1000, beta0 = 2.5, beta1 = -0.2, alpha0 = -0.9, alpha1 = 0.2, theta = 2.2) { trt <- rep(0:1, length.out = nobs) pr <- plogis(alpha0 + alpha1 * trt) y1 <- rbinom(nobs, size = 1, prob = pr) mu <- exp(beta0 + beta1 * trt) y2 <- rnegbin(nobs, mu = mu, theta = theta) while(any(y2_0 <- y2 < 1L)) y2[y2_0] <- rnegbin(sum(y2_0), mu = mu[y2_0], theta = theta) data.frame(trt = trt, y = y1 * y2) } coefsim <- function(nrep = 100, ...) { nam <- c("beta0", "beta1", "alpha0", "alpha1", "theta") rval <- matrix(NA, nrow = nrep, ncol = length(nam)) for(i in 1:nrep) { dat <- dgp(...) m <- hurdle(y ~ trt, data = dat, dist = "negbin") rval[i,] <- c(coef(m), m$theta) } colnames(rval) <- nam colMeans(rval) } powersim <- function(nrep = 100, level = 0.05, ...) { nam <- c("beta1", "alpha1", "both") rval <- matrix(NA, nrow = nrep, ncol = length(nam)) for(i in 1:nrep) { dat <- dgp(...) m <- hurdle(y ~ trt, data = dat, dist = "negbin") m0 <- hurdle(y ~ 1, data = dat, dist = "negbin") rval[i,] <- c(coeftest(m)[c(2, 4), 4], waldtest(m0, m)[2, 4]) } colnames(rval) <- nam colMeans(rval < level) } library("pscl") library("lmtest") set.seed(1) out1 <- coefsim() round(out1, digits = 3) set.seed(1) out2 <- powersim() round(out2, digits = 3) > The average estimates from code below are: > > count_(Intercept) count_trt zero_(Intercept) > 2.498327128 -0.000321315 0.910293501 > zero_trt > -0.200134813 > > Three of the four look right (ie, converging to population values), but the > count_trt is stuck at zero, regardless of sample size (when it should be ~ > -0.20). > > Does anyone see what's wrong? > > Thanks for any input. > > cheers, Dave > > > > mysim <- function(n, beta0, beta1, alpha0, alpha1, theta){ > trt <- c(rep(0,n), rep(1,n)) > ### mean function logit model > p0 <- exp(alpha0 + alpha1*trt)/(1 + exp(alpha0 + alpha1*trt)) > ### 0 / 1 based on p0 > y1 <- as.numeric(runif(n)>p0) > ### mean function count portion > mu <- exp(beta0 + beta1*trt) > ### estimate counts using NB dist > require(MASS, quietly = TRUE) > y2 <- rnegbin(n, mu = mu, theta = theta) > ### if y2 = 0, draw new value > while(sum(y2==0)>0){ > y2[which(y2==0)] <- rnegbin(length(which(y2==0)), > mu=mu[which(y2==0)], theta = theta) > } > y<-y1*y2 > data.frame(trt=trt,y=y) > } > #alpha0, alpha1 is the parameter for zero part > #beta0,beta1 is the parameter for negative binomial > #theta is dispersion parameter for negative binomial, infinity correspond to > poisson > # > > #example power analysis > #return three power, power1 for zero part, power2 for negative binomial part > #power3 for joint test,significance level can be set, default is 0.05 > #M is simulation time > #require pscl package > #library(pscl) > > mypower <- function(n, beta0, beta1, alpha0, alpha1, theta, siglevel=0.05, > M=1000){ > myfun <- function(n,beta0,beta1,alpha0,alpha1,theta,siglevel){ > data <- mysim(n,beta0,beta1,alpha0,alpha1,theta) > require(pscl, quietly = TRUE) > res <- hurdle(y ~ trt, data = data, dist = "negbin", trace = > FALSE) > est <- coef(res)#[c(2,4)] > #v<-res$vcov[c(2,4),c(2,4)] > #power1<-as.numeric(2*pnorm(-abs(est)[2]/sqrt(v[2,2])) #power2<-as.numeric(2*pnorm(-abs(est)[1]/sqrt(v[1,1])) #power3<-as.numeric((1-pchisq(t(est)%*%solve(v)%*%est,df=2)) #c(power1,power2,power3) > } > r <- replicate(M, myfun(n,beta0,beta1,alpha0,alpha1,theta,siglevel), > simplify=TRUE) > apply(r, 1, mean) > } > > out <- mypower(n = 1000, beta0 = 2.5, beta1 = -0.20, > alpha0 = -0.90, alpha1 = > 0.20, > theta = 2.2, M = 100) > out > > > -- > Dave Atkins, PhD > Research Associate Professor > Department of Psychiatry and Behavioral Science > University of Washington > datkins at u.washington.edu > > Center for the Study of Health and Risk Behaviors (CSHRB) > 1100 NE 45th Street, Suite 300 > Seattle, WA 98105 > 206-616-3879 > http://depts.washington.edu/cshrb/ > (Mon-Wed) > > Center for Healthcare Improvement, for Addictions, Mental Illness, > Medically Vulnerable Populations (CHAMMP) > 325 9th Avenue, 2HH-15 > Box 359911 > Seattle, WA 98104 > http://www.chammp.org > (Thurs) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From fabon.dzogang at lip6.fr Fri May 6 00:43:17 2011 From: fabon.dzogang at lip6.fr (Fabon Dzogang) Date: Fri, 6 May 2011 00:43:17 +0200 Subject: [R] [caret package] [trainControl] supplying predefined partitions to train with cross validation Message-ID: Hi all, I run R 2.11.1 under ubuntu 10.10 and caret version 2.88. I use the caret package to compare different models on a dataset. In order to compare their different performances I would like to use the same data partitions for every models. I understand that using a LGOCV or a boot type re-sampling method along with the "index" argument of the trainControl function, one is able to supply a training partition to the train function. However, I would like to apply a 10-fold cross validation to validate the models and I did not find any way to supply some predefined partition (created with createFolds) in this setting. Any help ? Thank you and great package by the way ! Fabon Dzogang. From fabon.dzogang at lip6.fr Fri May 6 00:43:17 2011 From: fabon.dzogang at lip6.fr (Fabon Dzogang) Date: Fri, 6 May 2011 00:43:17 +0200 Subject: [R] [caret package] [trainControl] supplying predefined partitions to train with cross validation Message-ID: Hi all, I run R 2.11.1 under ubuntu 10.10 and caret version 2.88. I use the caret package to compare different models on a dataset. In order to compare their different performances I would like to use the same data partitions for every models. I understand that using a LGOCV or a boot type re-sampling method along with the "index" argument of the trainControl function, one is able to supply a training partition to the train function. However, I would like to apply a 10-fold cross validation to validate the models and I did not find any way to supply some predefined partition (created with createFolds) in this setting. Any help ? Thank you and great package by the way ! Fabon Dzogang. From ggrothendieck at gmail.com Fri May 6 00:45:47 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Thu, 5 May 2011 18:45:47 -0400 Subject: [R] factors In-Reply-To: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> References: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> Message-ID: On Thu, May 5, 2011 at 6:28 PM, J wrote: > Hi, I'm requesting you don't berate me for asking this question: > > I clearly don't have the gist of factors. > > I have two dataframes, A and B. > > Each of them has a column containing strings (they're labels). > > I want to, one-by-one in a loop, compare the particular string in an entry from dataframe A to an entry in B, to see if they're the same. > > The problem, when posing the question: > searchID1 <- A[['label']][i] > possMatch1 <- B[['label']][j] > searchID1 == possMatch1 > > I get the error: > Error in Ops.factor(searchID1, possMatch1) : > ?level sets of factors are different > > I presume this is because the set of possible values in the 'labels' columns, respectively in A and B, differ. ?In my case, I'm not interested in this at all; I just want to compare individual entries from the two dataframes in a pair-wise fashion. > > Can I strip the "factors" associated with the entries? ?Is there a better way? > Assuming you read them in using read.table, use read.table(...whatever..., as.is = TRUE) and then it will read them as character strings rather than factors. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From jonsleepy at gmail.com Fri May 6 00:52:18 2011 From: jonsleepy at gmail.com (Jonathan) Date: Thu, 5 May 2011 18:52:18 -0400 Subject: [R] factors In-Reply-To: References: <14949803-AAAA-4EBD-8F8D-9D9C7C657D77@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vipul.bhushan at geodecapital.com Thu May 5 23:47:35 2011 From: vipul.bhushan at geodecapital.com (Bhushan, Vipul) Date: Thu, 5 May 2011 17:47:35 -0400 Subject: [R] Using GSUB to obtain a printing "\" Message-ID: Hello. I'd like to be able to print variable strings which contain "\" as-is, without interpreting (for example) "abcde\nuvxyz" as having an embedded newline (or whatever other escaped instruction). To do this, I've tried gsub, and here's some of my output (I've tried all kinds of variations to the arguments): > message(gsub("[\\]","\\\\","dsff\nfsd")) dsff fsd but I would like to see: dsff\nfsd? (which is the output of message("dsff\\nfsd") ) I don't know why gsub doesn't accomplish this substitution. How can I replace (for example), a "\" with "\\"? Any suggestions would be greatly appreciated. (I know I can use another language, or use inelegant brute force by writing a new routine which cycles through each possible character that could follow the single "\".) Watching the gsub echo at the R prompt shows that it is gsub (and not message) not behaving as I expected. Other information which the posting guidelines say might help: > sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin10.7.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods? ?base???? other attached packages: [1] Rd2roxygen_0.1-8 roxygen_0.1-2??? digest_0.4.2??? loaded via a namespace (and not attached): [1] tools_2.13.0 > Sys.getlocale() [1] "C" Thanks very much! From momadou at yahoo.fr Fri May 6 00:04:45 2011 From: momadou at yahoo.fr (Komine) Date: Thu, 5 May 2011 15:04:45 -0700 (PDT) Subject: [R] Draw a nomogram after glm In-Reply-To: <1304600157752-3498279.post@n4.nabble.com> References: <1304596297220-3498144.post@n4.nabble.com> <1304600157752-3498279.post@n4.nabble.com> Message-ID: <1304633085793-3499771.post@n4.nabble.com> Thanks Frank I will try rms package and give after the result. Komine -- View this message in context: http://r.789695.n4.nabble.com/Draw-a-nomogram-after-glm-tp3498144p3499771.html Sent from the R help mailing list archive at Nabble.com. From wdunlap at tibco.com Fri May 6 01:19:39 2011 From: wdunlap at tibco.com (William Dunlap) Date: Thu, 5 May 2011 16:19:39 -0700 Subject: [R] Using GSUB to obtain a printing "\" In-Reply-To: References: Message-ID: <77EB52C6DD32BA4D87471DCD70C8D7000433787F@NA-PA-VBE03.na.tibco.com> gsub does this because the string "dsff\nfsd" does not contain a backslash - the 5th character is a newline. The deparsed representation (used for printing the string) of a newline is "\n" but the string itself has not backslash. You can feed the output of deparse into message (or cat) so they show the backslash-n characters, but you then have to remove the enclosing quotes that deparse adds. > x<-"dsff\nfs\rd\1" > x [1] "dsff\nfs\rd\001" > message("x is ", deparse(x)) x is "dsff\nfs\rd\001" > message("x is ", gsub("^\"|\"$", "", deparse(x))) x is dsff\nfs\rd\001 > cat("x is", gsub("^\"|\"$", "", deparse(x)), "\n") x is dsff\nfs\rd\001 > cat(x) # this is with Windows Rgui.exe dsff fs d> Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Bhushan, Vipul > Sent: Thursday, May 05, 2011 2:48 PM > To: r-help at r-project.org > Subject: [R] Using GSUB to obtain a printing "\" > > Hello. I'd like to be able to print variable strings which > contain "\" as-is, without interpreting (for example) > "abcde\nuvxyz" as having an embedded newline (or whatever > other escaped instruction). > > To do this, I've tried gsub, and here's some of my output > (I've tried all kinds of variations to the arguments): > > > message(gsub("[\\]","\\\\","dsff\nfsd")) > dsff > fsd > > but I would like to see: > dsff\nfsd? > (which is the output of message("dsff\\nfsd") ) > > I don't know why gsub doesn't accomplish this substitution. > How can I replace (for example), a "\" with "\\"? Any > suggestions would be greatly appreciated. (I know I can use > another language, or use inelegant brute force by writing a > new routine which cycles through each possible character that > could follow the single "\".) Watching the gsub echo at the R > prompt shows that it is gsub (and not message) not behaving > as I expected. > > > Other information which the posting guidelines say might help: > > sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: x86_64-apple-darwin10.7.0/x86_64 (64-bit) > > locale: > [1] C > > attached base packages: > [1] stats???? graphics? grDevices utils???? datasets? methods? > ?base???? > > other attached packages: > [1] Rd2roxygen_0.1-8 roxygen_0.1-2??? digest_0.4.2??? > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > Sys.getlocale() > [1] "C" > > Thanks very much! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From A.Robinson at ms.unimelb.edu.au Fri May 6 01:33:02 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Fri, 6 May 2011 09:33:02 +1000 Subject: [R] functions pandit and treebase in the package apTreeshape In-Reply-To: <706AD256-C731-48AB-BFCE-4715BA28ACAF@uib.es> References: <706AD256-C731-48AB-BFCE-4715BA28ACAF@uib.es> Message-ID: <20110505233302.GG11866@ms.unimelb.edu.au> Hi Arnau, please send the output of sessionInfo() and the exact commands and response that you used to install and load apTreeshape. Cheers Andrew On Thu, May 05, 2011 at 04:42:58PM +0200, Arnau Mir wrote: > Hello. > > I'm trying to use the functions pandit and treebase. They are in the package apTreeshape. Once I've loaded the package, R responses: > > - no function pandit/treebase. > > Somebody knows why or what is the reason? > > > Thanks, > > Arnau. > ------------------------------------------------------------ > Arnau Mir Torres > Edifici A. Turmeda > Campus UIB > Ctra. Valldemossa, km. 7,5 > 07122 Palma de Mca. > tel: (+34) 971172987 > fax: (+34) 971173003 > email: arnau.mir at uib.es > URL: http://dmi.uib.es/~arnau > ------------------------------------------------------------ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Fri May 6 01:37:02 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Fri, 6 May 2011 09:37:02 +1000 Subject: [R] Vermunt's LEM in R In-Reply-To: References: <81D92650D162664791CB7C482A878412055A3D32@kmbx2.utk.tennessee.edu> Message-ID: <20110505233702.GH11866@ms.unimelb.edu.au> Hi David, you might have more luck with your request if you tell us what Vermunt's LEM *does*, and provided some links to introductory reading material ... Cheers Andrew On Thu, May 05, 2011 at 03:34:02PM +0000, David Joubert wrote: > > Hello- > > Does anyone know of packages that could emulate what J. Vermunt's LEM does ? What is the closest relative in R ? > I use both R and LEM but have trouble transforming my multiway tables in R into a .dat file compatible with LEM. > > Thanks, > > David Joubert > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From A.Robinson at ms.unimelb.edu.au Fri May 6 01:55:17 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Fri, 6 May 2011 09:55:17 +1000 Subject: [R] two-way group mean prediction in survreg with three factors In-Reply-To: <226A90808C6948B9AD39F895F91941B9@statistics.vt.edu> References: <20110505021249.GB11866@ms.unimelb.edu.au> <226A90808C6948B9AD39F895F91941B9@statistics.vt.edu> Message-ID: <20110505235517.GB11856@ms.unimelb.edu.au> Even then, I think that there's a problem. If C is in the model, then the response varies by C. The simplest way is to pick a value for C, and then evaluate the group mean estimates of A and B (and C). Something in my brain keeps asking whether another way to marginalize C for the purposes of predicting A and B is just to remove it from the model, or alternatively to make it a random effect. Neither idea seems rock solid at this point. Cheers Andrew On Thu, May 05, 2011 at 09:37:15AM -0400, Pang Du wrote: > Oops, I hope not too. Don't know why I had the brackets around B+C. My > model is actually A*B+C. And I'm not sure how to obtain the two-way > prediction of AB with C marginalized. Thanks. > > Pang > > -----Original Message----- > From: Andrew Robinson [mailto:A.Robinson at ms.unimelb.edu.au] > Sent: Wednesday, May 04, 2011 10:13 PM > To: Pang Du > Cc: r-help at r-project.org > Subject: Re: [R] two-way group mean prediction in survreg with three factors > > I hope not! > > Facetiousness aside, the model that you have fit contains C, and, > indeed, an interaction between A and C. So, the effect of A upon the > response variable depends on the level of C. The summary you want > must marginalize C somehow, probably by a weighted or unweighted > average across its levels. What does that summary really mean? Can > you meaningfully average across the levels of a predictor that is > included in the model as a main and an interaction term? > > Best wishes > > Andrew > > On Wed, May 04, 2011 at 12:24:50PM -0400, Pang Du wrote: > > I'm fitting a regression model for censored data with three categorical > > predictors, say A, B, C. My final model based on the survreg function is > > > > Surv(..) ~ A*(B+C). > > > > I know the three-way group mean estimates can be computed using the > predict > > function. But is there any way to obtain two-way group mean estimates, say > > estimated group mean for (A1, B1)-group? The sample group means don't > > incorporate censoring and thus may not be appropriate here. > > > > > > > > Pang Du > > > > Virginia Tech > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Program Manager, ACERA > Department of Mathematics and Statistics Tel: +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia (prefer email) > http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 > http://www.acera.unimelb.edu.au/ > > Forest Analytics with R (Springer, 2011) > http://www.ms.unimelb.edu.au/FAwR/ > Introduction to Scientific Programming and Simulation using R (CRC, 2009): > http://www.ms.unimelb.edu.au/spuRs/ -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From kebennett at alaska.edu Fri May 6 03:23:56 2011 From: kebennett at alaska.edu (Katrina Bennett) Date: Thu, 5 May 2011 17:23:56 -0800 Subject: [R] Installing rgdal in R: correct -configure flags for GDAL install on Linux Redhat Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From LeeSchulz at agecon.ksu.edu Fri May 6 03:10:22 2011 From: LeeSchulz at agecon.ksu.edu (Lee Schulz) Date: Thu, 5 May 2011 20:10:22 -0500 Subject: [R] MacKinnon critical value Message-ID: <783DFB3F8DDE3E4B88604A9ACDA0D84104405430@ageconnt.agecon.ksu.edu> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From f.harrell at vanderbilt.edu Fri May 6 03:42:11 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Thu, 5 May 2011 18:42:11 -0700 (PDT) Subject: [R] help with a survplot In-Reply-To: <20110505211537.6047fa4e@caprica> References: <20110430164400.531dfe9a@caprica> <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> <20110502104105.1660c591@caprica> <1304346610371-3490126.post@n4.nabble.com> <20110505211537.6047fa4e@caprica> Message-ID: <1304646131260-3500488.post@n4.nabble.com> Hi Marco, You're welcome. The number at risk at given time points is a fairly standard thing to add to survival plots. I don't think many people will confuse it with numbers of events. Personally I find shaded confidence bands a bit more helpful but both are useful. Frank Marco Barb?ra-2 wrote: > > Il giorno Mon, 2 May 2011 07:30:10 -0700 (PDT) > Frank Harrell <f.harrell at vanderbilt.edu> ha scritto: > >> Please elaborate. > > It is simply that, generally speaking, i don't like adding numbers to a > plot. I eventually realized that riskset cardinality may be a useful > indication. However, do not discriminate between events and censored > observations could generate some confusion, i think. > > Anyway, thanks to your software i was able to produce what i was > requested for in very little time, so thank you very much. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/help-with-a-survplot-tp3485998p3500488.html Sent from the R help mailing list archive at Nabble.com. From A.Robinson at ms.unimelb.edu.au Fri May 6 04:16:22 2011 From: A.Robinson at ms.unimelb.edu.au (Andrew Robinson) Date: Fri, 6 May 2011 12:16:22 +1000 Subject: [R] Installing rgdal in R: correct -configure flags for GDAL install on Linux Redhat In-Reply-To: References: Message-ID: <20110506021622.GJ11866@ms.unimelb.edu.au> Hi Katrina, the error message below is actually pretty explicit. Have you installed the PROJ.4 library? If not, then you need to install it. When I had to do this I think that I used macports: sudo port install proj Then you need to tell configure where to find it, using the protocol suggested below. I hope that this helps. Andrew On Thu, May 05, 2011 at 05:23:56PM -0800, Katrina Bennett wrote: > Hi, I'm installing rgdal but I keep having failures because I have not been > able to find a good source of information for the correct configuration > settings when installing GDAL. > > My error from the R install.packages("rgdal") is below. > > Can someone point me to a good source to tell me how to set after > ./configure when installing GDAL? > > I'd like to be able to work with raster images, geotiffs, netCDF files, and > other raster-based image processing in R. > > Right now I'm running ./configure for GDAL as follows: > > ./configure --prefix=$HOME/local/gdal/gdal-1.8.0 --with > jasper=$HOME/local/jasper/jasper-1.900.1.uuid/src/libjasper > > Any insight would be greatly appreciated! Thank you. > > Error output from R: > * installing *source* package 'rgdal' ... > gdal-config: gdal-config > checking for gcc... gcc -std=gnu99 > checking for C compiler default output file name... a.out > checking whether the C compiler works... yes > checking whether we are cross compiling... no > checking for suffix of executables... > checking for suffix of object files... o > checking whether we are using the GNU C compiler... yes > checking whether gcc -std=gnu99 accepts -g... yes > checking for gcc -std=gnu99 option to accept ANSI C... none needed > checking how to run the C preprocessor... gcc -std=gnu99 -E > checking for egrep... grep -E > checking for ANSI C header files... yes > checking for sys/types.h... yes > checking for sys/stat.h... yes > checking for stdlib.h... yes > checking for string.h... yes > checking for memory.h... yes > checking for strings.h... yes > checking for inttypes.h... yes > checking for stdint.h... yes > checking for unistd.h... yes > checking proj_api.h usability... no > checking proj_api.h presence... no > checking for proj_api.h... no > Error: proj_api.h not found. > If the PROJ.4 library is installed in a non-standard location, > use --configure-args='--with-proj- > include=/opt/local/include' > for example, replacing /opt/local/* with appropriate values > for your installation. If PROJ.4 is not installed, install it. > ERROR: configuration failed for package 'rgdal' > * removing > '/import/home/u1/uaf/kbennett/R/x86_64-unknown-linux-gnu-library/2.11/rgdal' > > > > > > > > > -- > Katrina E. Bennett > PhD Student > University of Alaska Fairbanks > International Arctic Research Center > 930 Koyukuk Drive, PO Box 757340 > Fairbanks, Alaska 99775-7340 > 907-474-1939 office > 907-385-7657 cell > kebennett at alaska.edu > > > Personal Address: > UAF, PO Box 752525 > Fairbanks, Alaska 99775-2525 > bennett.katrina at gmail.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/ From dwinsemius at comcast.net Fri May 6 04:45:46 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 5 May 2011 22:45:46 -0400 Subject: [R] MacKinnon critical value In-Reply-To: <783DFB3F8DDE3E4B88604A9ACDA0D84104405430@ageconnt.agecon.ksu.edu> References: <783DFB3F8DDE3E4B88604A9ACDA0D84104405430@ageconnt.agecon.ksu.edu> Message-ID: On May 5, 2011, at 9:10 PM, Lee Schulz wrote: > Hello, > > > > I am doing an Engle Granger test on the residuals of two I(1) > processes. > I would like to get the MacKinnon (1996) critical value, say at > 10%. I > have 273 observations with 5 integrated explanatory variables , so > that > k=4. Could someone help me with the procedure in R? See if this helps: http://search.r-project.org/cgi-bin/namazu.cgi?query=Engle+Granger+mackinnon&max=100&result=normal&sort=score&idxname=functions&idxname=Rhelp08&idxname=Rhelp10&idxname=Rhelp02 -- David Winsemius, MD West Hartford, CT From ronggui.huang at gmail.com Fri May 6 06:13:21 2011 From: ronggui.huang at gmail.com (Wincent) Date: Fri, 6 May 2011 12:13:21 +0800 Subject: [R] Vermunt's LEM in R In-Reply-To: References: <81D92650D162664791CB7C482A878412055A3D32@kmbx2.utk.tennessee.edu> Message-ID: I guess LEM is a software for latent class analysis. If so, you may want to have a look at poLCA package. Regards Ronggui On 5 May 2011 23:34, David Joubert wrote: > > Hello- > > Does anyone know of packages that could emulate what J. Vermunt's LEM does ? What is the closest relative in R ? > I use both R and LEM but have trouble transforming my multiway tables in R into a .dat file compatible with LEM. > > Thanks, > > David Joubert > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Wincent Ronggui HUANG Sociology Department of Fudan University PhD of City University of Hong Kong http://asrr.r-forge.r-project.org/rghuang.html From timhesterberg at gmail.com Fri May 6 06:16:42 2011 From: timhesterberg at gmail.com (Tim Hesterberg) Date: Thu, 05 May 2011 21:16:42 -0700 Subject: [R] memory and bootstrapping In-Reply-To: (message from Prof Brian Ripley on Thu, 5 May 2011 09:01:46 +0100 (BST)) References: Message-ID: (Regarding bootstrapping logistic regression.) If the number of rows with Y=1 is small, it doesn't matter that n is huge. If both number of successes and failures is huge, then ad Ripley notes you can use asymptotic CIs. The mean difference in predicted probabilities is a nonlinear function of the coefficients, so you can use the delta method to get standard errors. In general, if you're not sure about normality and bias, you can use the bootstrap to estimate how close to normal the sampling distribution is. The results may surprise you. For example, for a one-sample mean, if the population has skewness = 2 (like an exponential distn), you need n=5000 before the CLT is reasonably accurate (actual one-sided non-coverage probabilities within 10% of the nominal, for a 95% interval). Finally, you can speed up bootstrapping glms by using starting values based on the coefficients estimated from the original data. And, you can compute the model matrix once and resample rows of that along with y, rather than computing a model matrix from scratch each time. Tim Hesterberg >The only reason the boot package will take more memory for 2000 >replications than 10 is that it needs to store the results. That is >not to say that on a 32-bit OS the fragmentation will not get worse, >but that is unlikely to be a significant factor. > >As for the methodology: 'boot' is support software for a book, so >please consult it (and not secondary sources). From your brief >description it looks to me as if you should be using studentized CIs. > >130,000 cases is a lot, and running the experiment on a 1% sample >may well show that asymptotic CIs are good enough. > >On Thu, 5 May 2011, E Hofstadler wrote: > >> hello, >> >> the following questions will without doubt reveal some fundamental >> ignorance, but hopefully you can still help me out. >> >> I'd like to bootstrap a coefficient gained on the basis of the >> coefficients in a logistic regression model (the mean differences in >> the predicted probabilities between two groups, where each predict() >> operation uses as the newdata-argument a dataframe of equal size as >> the original dataframe).I've got 130,000 rows and 7 columns in my >> dataframe. The glm-model uses all variables (as well as two 2-way >> interactions). >> >> System: >> - R-version: 2.12.2 >> - OS: Windows XP Pro, 32-bit >> - 3.16Ghz intel dual core processor, 2.9GB RAM >> >> I'm using the boot package to arrive at the standard errors for this >> difference, but even with only 10 replications, this takes quite a >> long time: 216 seconds (perhaps this is partly also due to my >> inefficiently programmed function underlying the boot-call, I'm also >> looking into that). >> >> I wanted to try out calculating a bca-bootstrapped confidence >> interval, which as I understand requires a lot more replications than >> normal-theory intervals. Drawing on John Fox' Appendix to his "An R >> Companion to Applied Regression", I was thinking of trying out 2000 >> replications -- but this will take several hours to compute on my >> system (which isn't in itself a major issue though). >> >> My Questions: >> - let's say I try bootstrapping with 2000 replications. Can I be >> certain that the memory available to R will be sufficient for this >> operation? >> - (this relates to statistics more generally): is it a good idea in >> your opinion to try bca-bootstrapping, or can it be assumed that a >> normal theory confidence interval will be a sufficiently good >> approximation (letting me get away with, say, 500 replications)? >> >> >> Best, >> Esther > >-- >Brian D. Ripley, ripley at stats.ox.ac.uk >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272866 (PA) >Oxford OX1 3TG, UK Fax: +44 1865 272595 > > From deepayan.sarkar at gmail.com Fri May 6 07:15:47 2011 From: deepayan.sarkar at gmail.com (Deepayan Sarkar) Date: Fri, 6 May 2011 10:45:47 +0530 Subject: [R] Bug in lattice that shipped with R 2.13.0 Message-ID: Hi all, I had meant to make this announcement earlier but had forgotten (a recent bug report reminded me). The version of lattice that ships with R 2.13.0 has a fairly serious bug in panel.abline (which would neglect to draw many negative slope lines). If you use lattice with R 2.13.0, you should update to the latest version of lattice (0.19-26 or better). -Deepayan From pdalgd at gmail.com Fri May 6 08:31:55 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Fri, 6 May 2011 08:31:55 +0200 Subject: [R] ANOVA 1 too few degrees of freedom In-Reply-To: <1304631042962-3499649.post@n4.nabble.com> References: <1304451451151-3493349.post@n4.nabble.com> <1304631042962-3499649.post@n4.nabble.com> Message-ID: <358A95B7-BC16-4D5B-B553-9CE371E0A200@gmail.com> On May 5, 2011, at 23:30 , Rovinpiper wrote: > Thanks slre, > > I seem to be making some progress now. > > Using a colon instead of an asterisk in the code really changes things. I > had been getting residual SS and MS of zero. Which is ridiculous. Now I get > much more plausible values. > > Also, When I used an asterisk instead of a colon It wouldn't give results > for three way interactions. With colons it will. > > You are correct about plot being nested within treatment. There are six > plots in each of 2 treatments. > > So, I guess I will have to perform a separate analysis to quantify the > effect of treatment. > Beware that as you have highly significant effects of plot and its interaction with day, and plot being nested in treatment, you can't test for treatment or treatment:day effect with a systematic effect of plot and plot:treatment in the model (you are only getting p values because of the sequential computation of the anova table - if you put plot before treatment, you'd get zero df). More likely, you want to make the "plot" terms random, as in ~treatment*day + Error(plot/day) > Thanks again. > > Analysis of Variance Table > > Response: Combined.Rs > > Df Sum Sq Mean Sq F value Pr(>F) > Combined.Trt 1 > 52.80 52.805 96.2601 < 2.2e-16 *** > Combined.Plot 10 > 677.69 67.769 123.5380 < 2.2e-16 *** > as.factor(Combined.Day) 16 > 2817.47 176.092 321.0041 < 2.2e-16 *** > Combined.Trt:as.factor(Combined.Day) 16 47.82 > 2.989 5.4487 4.048e-10 *** > Combined.Trt:Combined.Plot:as.factor(Combined.Day)80 455.42 5.693 > 10.3776 < 2.2e-16 *** > Residuals > 284 155.79 0.549 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > > -- > View this message in context: http://r.789695.n4.nabble.com/ANOVA-1-too-few-degrees-of-freedom-tp3493349p3499649.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From Gerrit.Eichner at math.uni-giessen.de Fri May 6 09:11:01 2011 From: Gerrit.Eichner at math.uni-giessen.de (Gerrit Eichner) Date: Fri, 6 May 2011 09:11:01 +0200 (MEST) Subject: [R] Using functions/loops for repetitive commands In-Reply-To: <1304604084040-3498427.post@n4.nabble.com> References: <1304590606704-3498006.post@n4.nabble.com> <1304604084040-3498427.post@n4.nabble.com> Message-ID: Hello, Derek, first of all, be very aware of what David Winsemius said; you are about to enter the area of "unprincipled data-mining" (as he called it) with its trap -- one of many -- of multiple testing. So, *if* you know what the consequences and possible remedies are, a purely R-syntactic "solution" to your problem might be the (again not fully tested) hack below. > If so how can I change my code to automate the chisq.test in the same > way I did for the wilcox.test? Try lapply( [], function( y) chisq.test( y, $) ) or even shorter: lapply( [], chisq.test, $ ) However, in the resulting output you will not be seeing the names of the variables that went into the first argument of chisq.test(). This is a little bit more complicated to resolve: lapply( names( []), function( y) eval( substitute( chisq.test( $y0, $tension), list( y0 = y) ) ) ) Still another possibility is to use xtabs() (with its summary-method) which has a formula argument. Hoping that you know what to do with the results -- Gerrit --------------------------------------------------------------------- Dr. Gerrit Eichner Mathematical Institute, Room 212 gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner From jabbba at gmail.com Fri May 6 09:17:11 2011 From: jabbba at gmail.com (Jabba) Date: Fri, 6 May 2011 09:17:11 +0200 Subject: [R] help with a survplot In-Reply-To: <1304646131260-3500488.post@n4.nabble.com> References: <20110430164400.531dfe9a@caprica> <85D19B4D-6ED7-4A15-BC43-14F8F885C062@comcast.net> <20110502104105.1660c591@caprica> <1304346610371-3490126.post@n4.nabble.com> <20110505211537.6047fa4e@caprica> <1304646131260-3500488.post@n4.nabble.com> Message-ID: <20110506091711.21c8445a@caprica> Il giorno Thu, 5 May 2011 18:42:11 -0700 (PDT) Frank Harrell ha scritto: > Hi Marco, > > You're welcome. > > The number at risk at given time points is a fairly standard thing to > add to survival plots. I know, but last year, as a "newbye" in biostatistics, i felt the need to read rms book exactly because there were plenty of "standard things" that did not convince me. From i.visser at uva.nl Fri May 6 09:56:05 2011 From: i.visser at uva.nl (Ingmar Visser) Date: Fri, 6 May 2011 09:56:05 +0200 Subject: [R] Vermunt's LEM in R In-Reply-To: References: <81D92650D162664791CB7C482A878412055A3D32@kmbx2.utk.tennessee.edu> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From a0066612 at nus.edu.sg Fri May 6 06:24:42 2011 From: a0066612 at nus.edu.sg (Haleh Ghaem Maralani) Date: Fri, 6 May 2011 12:24:42 +0800 Subject: [R] rcspline.problem Message-ID: <91EFC2B9C48F5D4E9B6F4B5CDE901EC604A3EB1867@MBX24.stu.nus.edu.sg> Dear Dr ; I am a PhD student at Epidemiology department of National University of Singapore. I used R command (rcspline.plot) for plotting restricted cubic spline ??? the model is based on Cox. I managed to get a plot without adjustment for other covariates, but I have a problem regarding to adjusting the confounders. I applied below command to generate the matrix for covariates. m=as.matrix(age,sex) or m1=matrix(age,sex) or m2=cbind(age,sex) But, when I input ..... adj=m, or adj=m1, or adj=m2...... in the model, R gives below error: Error in pchisq(q, df, lower.tail, log.p) : Non-numeric argument to mathematical function In addition: Warning message: In coxph.fit(cbind(x, xx, adj), cbind(y, event), strata = NULL, : Loglik converged before variable 1,2,3,4 ; beta may be infinite. I would be grateful if you take my issue into your consideration and help me on this case Sincerely Yours Haleh Ghaem PhD student, NUS From andresago1 at hotmail.com Fri May 6 06:28:57 2011 From: andresago1 at hotmail.com (andre bedon) Date: Fri, 6 May 2011 14:28:57 +1000 Subject: [R] for loop Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From momadou at yahoo.fr Fri May 6 09:35:40 2011 From: momadou at yahoo.fr (Komine) Date: Fri, 6 May 2011 00:35:40 -0700 (PDT) Subject: [R] Draw a nomogram after glm In-Reply-To: <1304596297220-3498144.post@n4.nabble.com> References: <1304596297220-3498144.post@n4.nabble.com> Message-ID: <1304667340965-3501534.post@n4.nabble.com> Hi, I use datadist fonction in rms library in order to draw my nomogram. After reading, I try this code: f<-lrm(Y~L+P,data=donnee) f <- lrm(Y~L+P,data=donnee) d <- datadist(f,data=donnee) options(datadist="d") f <- lrm(Y~L+P) summary(f,L=c(0,506,10),P=c(45,646,10)) plot(Predict(f,L=200:800,P=3)) Unfortunately, I have error after the 2nd code: Erreur dans datadist(f, data = donnee) : program logic error Please could you provide me a document more simple which is more understandable for new R user. Thanks for your help. Komine -- View this message in context: http://r.789695.n4.nabble.com/Draw-a-nomogram-after-glm-tp3498144p3501534.html Sent from the R help mailing list archive at Nabble.com. From p.pagel at wzw.tum.de Fri May 6 10:03:34 2011 From: p.pagel at wzw.tum.de (Philipp Pagel) Date: Fri, 6 May 2011 10:03:34 +0200 Subject: [R] RV: R question In-Reply-To: <8E142E54149A314AAA92382A8D3447B129A5F82F58@EXCLUS2007.ine.cl> References: <8E142E54149A314AAA92382A8D3447B129A5F82F58@EXCLUS2007.ine.cl> Message-ID: <20110506080334.GA5426@maker> > which is the maximum large of digits that R has?, because SQL work > with 50 digits I think. and I need a software that work with a lot > of digits. The .Machine() command will provide some insight into these matters. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan Maximus-von-Imhof-Forum 3 85354 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ From savicky at praha1.ff.cuni.cz Fri May 6 10:24:56 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Fri, 6 May 2011 10:24:56 +0200 Subject: [R] for loop In-Reply-To: References: Message-ID: <20110506082455.GA4026@praha1.ff.cuni.cz> On Fri, May 06, 2011 at 02:28:57PM +1000, andre bedon wrote: > > Hi, > I'm hoping someone can offer some advice:I have a matrix "x" of dimensions 160 by 10000. I need to create a matrix "y", where the first 7 elements are equal to x[1]^1/7, then the next 6 equal to x[2]^1/6, next seven x[3]^1/7 and so on all the way to the 10400000th element. I have implemented this with a for loop an hour ago and it is still loading, can anyone offer any suggestions as to how I can create this matrix without using loops? I would really appreciate any suggestions. Hi. Since indexing x[1], x[2], ... is used and also the description of y corresponds more to a vector, let me first suggest a solution for vectors. x <- rep(42, times=4) # any vector of even length x <- x/c(7, 6) rep(x, times=rep(c(7, 6), length=length(x))) [1] 6 6 6 6 6 6 6 7 7 7 7 7 7 6 6 6 6 6 6 6 7 7 7 7 7 7 The input vector may be obtained using c() from a matrix. The output vector may be reformatted using matrix(). However, for a matrix solution, a more precise description of the question is needed. Hope this helps. Petr Savicky. From Bernhard_Pfaff at fra.invesco.com Fri May 6 09:39:02 2011 From: Bernhard_Pfaff at fra.invesco.com (Pfaff, Bernhard Dr.) Date: Fri, 6 May 2011 08:39:02 +0100 Subject: [R] MacKinnon critical value In-Reply-To: References: <783DFB3F8DDE3E4B88604A9ACDA0D84104405430@ageconnt.agecon.ksu.edu> Message-ID: Hello Lee, in addition to David's answer, see: ?MacKinnonPValues in package 'urca' (CRAN and R-Forge). Best, Bernhard > -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Im Auftrag von David Winsemius > Gesendet: Freitag, 6. Mai 2011 04:46 > An: Lee Schulz > Cc: R-help at r-project.org > Betreff: Re: [R] MacKinnon critical value > > > On May 5, 2011, at 9:10 PM, Lee Schulz wrote: > > > Hello, > > > > > > > > I am doing an Engle Granger test on the residuals of two I(1) > > processes. > > I would like to get the MacKinnon (1996) critical value, > say at 10%. > > I have 273 observations with 5 integrated explanatory > variables , so > > that k=4. Could someone help me with the procedure in R? > > See if this helps: > > http://search.r-project.org/cgi-bin/namazu.cgi?query=Engle+Gra nger+mackinnon&max=100> &result=normal&sort=score&idxname=functions&idxname=Rhelp08&id xname=Rhelp10&idxname=Rhelp02 > > -- > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ***************************************************************** Confidentiality Note: The information contained in this ...{{dropped:10}} From nick.sabbe at ugent.be Fri May 6 09:33:52 2011 From: nick.sabbe at ugent.be (Nick Sabbe) Date: Fri, 6 May 2011 09:33:52 +0200 Subject: [R] Looping over graphs in igraph In-Reply-To: References: Message-ID: <050001cc0bbf$f2c35b50$d84a11f0$@sabbe@ugent.be> Hi Danielle. You appear to have two problems: 1) getting the data into R Because I don't have the file at hand, I'm going to simulate reading it through a text connection orgdata<-textConnection("Graph ID | Vertex1 | Vertex2 | weight\n1 | Alice | Bob | 2\n1 | Alice | Chris | 1\n1 | Alice | Jane | 2\n1 | Bob | Jane | 2\n1 | Chris | Jane | 3\n2 | Alice | Tom | 2\n2 | Alice | Kate | 1\n2 | Kate | Tom | 3\n2 | Tom | Mike | 2") dfr <-read.table(orgdata, header=TRUE, sep="|", as.is=TRUE, strip.whit=TRUE) For you, this would probably be more like dfr <-read.table("somepath/fileOfInterest.csv", header=TRUE, sep="|", as.is=TRUE, strip.whit=TRUE) 2) performing actions per graph id require(igraph) result<-sapply(unique(dfr$Graph.ID), function(curID){ #There may be more elegant ways of creating the graphs per ID, but it works curDfr<- dfr[dfr$Graph.ID==curID,] g<-graph.edgelist(as.matrix(curDfr[,c("Vertex1", "Vertex2")])) g<-set.edge.attribute(g, "weight", value= curDfr$weight) #return whatever information you're interested about, based on graph object g #for now I'm just returning edge and vertex counts return(c(v=vcount(g), e=ecount(g))) }) colnames(result)<-unique(dfr$Graph.ID) print(result) HTH, Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Danielle Li Sent: donderdag 5 mei 2011 22:25 To: r-help at r-project.org Subject: [R] Looping over graphs in igraph Hi, I'm trying to do some basic social network analysis with igraph in R, but I'm new to R and haven't been able to find documentation on a couple basic things: I want to run igraph's community detection algorithms on a couple thousand small graphs but don't know how to automate igraph looking at multiple graphs described in a single csv file. My data look like something in ncol format, but with an additional column that has an ID for which graph the edge belongs in: Graph ID | Vertex1 | Vertex2 | weight 1 | Alice | Bob | 2 1 | Alice | Chris | 1 1 | Alice | Jane | 2 1 | Bob | Jane | 2 1 | Chris | Jane | 3 2 | Alice | Tom | 2 2 | Alice | Kate | 1 2 | Kate | Tom | 3 2 | Tom | Mike | 2 so on and so forth for about 2000 graph IDs, each with about 20-40 vertices. I've tried using the "split" command but it doesn't recognize my graph id: ("object 'graphid' not found)--this may just be because I don't know how to classify a column of a csv as an object. Ultimately, I want to run community detection on each graph separately--to look only at the edges when the graph identifier is 1, make calculations on that graph, then do it again for 2 and so forth. I suspect that this isn't related to igraph specifically--I just don't know the equivalent command in R for what in pseudo Stata code would read as: forvalues i of 1/N { temp_graph=subrows of the main csv file for which graphid==`i' cs`i' = leading.eigenvector.community.step(temp_graph) convert cs`i'$membership into a column in the original csv } I want the output to look something like: Graph ID | Vertex1 | Vertex2 | weight | Vertex 1 membership | Vertex 2 membership | # of communities in the graph 1 | Alice | Bob | 2 | A | B | 2 1 | Alice | Chris | 1 | A | B | 2 1 | Alice | Jane | 2 | A | B | 2 1 | Bob | Jane | 2 | B | B | 2 1 | Chris | Jane | 3 | B | B | 2 2 | Alice | Tom | 2 | A | B | 3 2 | Alice | Kate | 1 | A | C | 3 2 | Kate | Tom | 3 | C | B | 3 2 | Tom | Mike | 2 | B | C | 3 Here, the graphs are treated completely separately so that community A in graph 1 need not have anything to do with community A in graph 2. I would really appreciate any ideas you guys have. Thank you! Danielle [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From i.petzev at gmail.com Fri May 6 09:35:21 2011 From: i.petzev at gmail.com (ivan) Date: Fri, 6 May 2011 09:35:21 +0200 Subject: [R] for loop with global variables In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jim at bitwrit.com.au Fri May 6 10:35:58 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Fri, 06 May 2011 18:35:58 +1000 Subject: [R] Insert values to histogram In-Reply-To: <1304596216055-3498140.post@n4.nabble.com> References: <1304596216055-3498140.post@n4.nabble.com> Message-ID: <4DC3B2EE.4080403@bitwrit.com.au> On 05/05/2011 09:50 PM, matibie wrote: > I'm trying to add the exact value on top of each column of an Histogram, i > have been trying with the text function but it doesn't work. > The problem is that the program it self decides the exact value to give to > each column, and ther is not like in a bar-plot that I know exactly which > values are been plotting. Hi Matias, You are probably using the "hist" function in the graphics package. If so, that function returns a list containing components named "counts" (for frequency histograms) and "density" (for density histograms). So if you collect that list: histinfo<-hist(...) histinfo$counts you will see the heights of the bars. As Greg has noted, many people do not agree with adding the counts to the plot, but if you want to do it, there are your numbers. Jim From jim at bitwrit.com.au Fri May 6 10:43:16 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Fri, 06 May 2011 18:43:16 +1000 Subject: [R] Null In-Reply-To: <1304599725057-3498261.post@n4.nabble.com> References: <1304599725057-3498261.post@n4.nabble.com> Message-ID: <4DC3B4A4.5000702@bitwrit.com.au> On 05/05/2011 10:48 PM, pcc wrote: > This is probably a very simple question but I am completely stumped!I am > trying to do shapiro.wilk(x) test on a relatively small dataset(75) and each > time my variable and keeps coming out as 'NULL', and > >> shapiro.test(fcv) > Error in complete.cases(x) : no input has determined the number of cases > > my text file looks like this: > Hi pcc, I think the problem may be in the way you are reading in the data. Try this (I named the data file "null.csv"): read.csv("null.csv") shapiro.test(fcv[,1]) Jim From nikkihathi at gmail.com Fri May 6 10:59:08 2011 From: nikkihathi at gmail.com (neetika nath) Date: Fri, 6 May 2011 09:59:08 +0100 Subject: [R] [caret package] [trainControl] supplying predefined partitions to train with cross validation In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From fabon.dzogang at lip6.fr Fri May 6 12:32:04 2011 From: fabon.dzogang at lip6.fr (Fabon Dzogang) Date: Fri, 6 May 2011 12:32:04 +0200 Subject: [R] [caret package] [trainControl] supplying predefined partitions to train with cross validation In-Reply-To: References: Message-ID: Hello, Thank you for your reply but I'm not sure your code answers my needs, from what I read it creates a 10-fold partition and then extracts the kth partition for future processing. My question was rather: once I have a 10-fold partition of my data, how to supply it to the "train" function of the caret package. Here's some sample code : folds <- createFolds(my_dataset_classes, 10) # I can't use index=folds on this one, it will train on the 1/k and test on k-1 t_control <- trainControl(method="cv", number=10) # here I would like train to take account of my predefined folds model <- train(my_dataset_predictors, my_dataset_classes, method="svmLinear", trControl = t_control) Cheers, Fabon. On Fri, May 6, 2011 at 10:59 AM, neetika nath wrote: > Hi, > I did the similar experiment with my data. may be following code will give > you some idea. It might not be the best solution but for me it worked. > please do share if you get other idea. > Thank you > #### CODE### > > library(dismo) > > set.seed(111) > > dd<-read.delim("yourfile.csv",sep=",",header=T) > > # To keep a check on error > > options(error=utils::recover) > > # dd- data to be split for 10 Fold CV, this will split complete data into 10 > fold > > number<-kfold(dd, k=10) > > case 1: if k ==1 > > x<-NULL; > > #retrieve all the index (from your data) for 1st fold in x, such that you > can use it as a test set and remaining can be used as train set for #1st > iteration. > > x<-which(number==k) > > On Thu, May 5, 2011 at 11:43 PM, Fabon Dzogang > wrote: >> >> Hi all, >> >> I run R 2.11.1 under ubuntu 10.10 and caret version 2.88. >> >> I use the caret package to compare different models on a dataset. In >> order to compare their different performances I would like to use the >> same data partitions for every models. I understand that using a LGOCV >> or a boot type re-sampling method along with the "index" argument of >> the trainControl function, one is able to supply a training partition >> to the train function. >> >> However, I would like to apply a 10-fold cross validation to validate >> the models and I did not find any way to supply some predefined >> partition (created with createFolds) in this setting. Any help ? >> >> Thank you and great package by the way ! >> >> Fabon Dzogang. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- Fabon Dzogang From matibie at gmail.com Fri May 6 11:54:55 2011 From: matibie at gmail.com (matibie) Date: Fri, 6 May 2011 02:54:55 -0700 (PDT) Subject: [R] Insert values to histogram In-Reply-To: <4DC3B2EE.4080403@bitwrit.com.au> References: <1304596216055-3498140.post@n4.nabble.com> <4DC3B2EE.4080403@bitwrit.com.au> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From thibault.charles at solamen.fr Fri May 6 10:33:21 2011 From: thibault.charles at solamen.fr (Thibault Charles) Date: Fri, 6 May 2011 10:33:21 +0200 Subject: [R] [R-users] Problems with the functions "samplingSimple" and "morris" Message-ID: <000101cc0bc8$41c36170$c54a2450$@charles@solamen.fr> Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : From Berwin.Turlach at gmail.com Fri May 6 11:53:33 2011 From: Berwin.Turlach at gmail.com (Berwin A Turlach) Date: Fri, 6 May 2011 17:53:33 +0800 Subject: [R] Using $ accessor in GAM formula In-Reply-To: <4DC31D9A.3020402@xtra.co.nz> References: <6A4060E3-3093-48A2-A83E-9E66E57530DE@comcast.net> <4DC2F506.4090306@ucalgary.ca> <4DC31D9A.3020402@xtra.co.nz> Message-ID: <20110506175333.75fda4e1@bossiaea> G'day Rolf, On Fri, 06 May 2011 09:58:50 +1200 Rolf Turner wrote: > but it's strange that the dodgey code throws an error with gam(dat1$y > ~ s(dat1$x)) but not with gam(dat2$cf ~ s(dat2$s)) > Something a bit subtle is going on; it would be nice to be able to > understand it. Well, R> traceback() 3: eval(expr, envir, enclos) 2: eval(inp, data, parent.frame()) 1: gam(dat$y ~ s(dat$x)) So the lines leading up to the problem seem to be the following from the gam() function: vars <- all.vars(gp$fake.formula[-2]) inp <- parse(text = paste("list(", paste(vars, collapse = ","), ")")) if (!is.list(data) && !is.data.frame(data)) data <- as.data.frame(data) Setting R> options(error=recover) running the code until the error occurs, and then examining the frame number for the gam() call shows that "inp" is "expression(list( dat1,x ))" in your first example and "expression(list( dat2,s ))" in your second example. In both examples, "data" is "list()" (not unsurprisingly). When, dl <- eval(inp, data, parent.frame()) is executed, it tries to eval "inp", in both cases "dat1" and "dat2" are found, obviously, in the parent frame. In your first example "x" is (typically) not found and an error is thrown, in your second example an object with name "s" is found in "package:mgcv" and the call to eval succeeds. "dl" becomes a list with two components, the first being, respectively, "dat1" or "dat2", and the second the body of the function "s". (To verify that, you should probably issue the command "debug(gam)" and step through those first few lines of the function until you reach the above command.) The corollary is that you can use the name of any object that R will find in the parent frame, if it is another data set, then that data set will become the second component of "inp". E.g.: R> dat=data.frame(min=1:100,cf=sin(1:100/50)+rnorm(100,0,.05)) R> gam(dat$cf ~ s(dat$min)) Family: gaussian Link function: identity Formula: dat$cf ~ s(dat$min) Estimated degrees of freedom: 3.8925 total = 4.892488 GCV score: 0.002704789 Or R> dat=data.frame(BOD=1:100,cf=sin(1:100/50)+rnorm(1