Objection to Study 1: Isn't this the same thing as observing tipping/nudity behavior in people's own homes except that it is in the laboratory and you are paying for the pizza? That is to say, isn't this really just an observational design but in the foreign habitat of the laboratory? If the causal process behind the higher-nude-tipping phenomenon is really the first that you identified (that there are 2 sorts of persons, sort (a) tips higher and sometimes comes to the door nude and sort (b) tips lower and never or seldom comes to the door nude) we would expect to see a higher mean tip among the lab rats that come to the door nude. If the causal process is really the second one you identified (within-person variation depending on state of clothedness) we would also expect to see a higher mean tip among the people in the lab that came to the door nude. In other words, the study does nothing to differentiate among the two causal processes we are interested in.To me, the design would seem to allow, especially in conjunction with Study #2, that the difference between nude and non-nude tippers is not the result of simple happenstance where they happen to be nude sometimes answering the door. In this case, if a difference was observed, we would establish that the person who would prefer to answer the door nude, when answering the door nude, tip more than people who prefer to answer the door in pajamas, answering the door in pajamas. From this study alone, you are right that we cannot determine whether being nude itself causes more tipping. Comparing the results from this study to Study #2 would allow us to compare tips from someone who has chosen to be nude vs. someone who is nude by random assignment. Alternatively, if you wanted to make the causal inference in the same study, we could add a third study to this condition where the third group were given pajamas and not the option of being nude--using the logic of instrumental variables, if the nudity itself was not causal (but merely a means of distinguishing dispositionally large- from small-tippers), then the average tips in this third group should equal the average tips in the first and second groups combined.
Objection to Study 2: This one is more sophisticated, but still has problems. Causal process (a) relies on there being the kind of person who willingly goes to the door nude and there being a sort of person who does not willingly go to the door nude. This study lumps both type of person together and makes it impossible to differentiate between them because people have no choice as to whether or when they go to the door nude. So this study wouldn't actually speak to causal process (a) at all. This study seems a more direct test of causal process (b) because it tests for within-person variation dependent on clothed-ness. The fact that people are told whether to go to the door nude, however, means that you are lumping potential-nudies together with people who would normally never, ever, go to the door nude. If you did observe a difference of nude/clothed mean tips, you would have to assume that people who would never go to the door nude behave (when forced to go nude) like people who would sometimes go to the door nude in order to claim that the findings are meaningful. Any claims about effect size would be questionable because of the lumping of nudies and non-nudies.Here, I think, the normally trenchant critic from Beauxbaton goes astray, perhaps as a result of the deficiencies in the French educational system. If there are really only two possible broad explanations, and experimental results give reason to rule out one explanation, then this counts as evidence for the other explanation. If we observe no difference for the two experimental groups, this would be evidence against there being any special causal effect of nude-ness, and thus would be evidence for the idea that nude-door-answerers are just dispositionally bigger tippers.
However, our friend from Beauxbaton does suggest a real problem in that underlying all this is the possibility that nudity could have a causal effect but only for the kind of people who would sometimes answer the door nude anyway. In study #2, this would result in a higher average tip when subjects were nude, but that difference in the average could be attributed wholly to that subset of respondents who are clothing-optional-door-answerers in the real world. If we had some measure of that (say, on a pre-experiment questionnaire), we could see if that was indeed the case. Alternatively, we can get at that issue through the combination of this experiment and the previous experiment with the added pajamas-for-everyone condition.