How do I count thee? Let me count the ways?

How easily can you be identified on the Internet?

How easily can you be identified on the Internet? Suppose you finish your meal at a restaurant, you are about to pay the check, and t...

Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Wednesday, June 22, 2022

What is the difference between statistics and data analysis?

What is the difference between statistics and data analysis?

Of course to answer this we need to define those terms, and definitions of such things are hardly standard. But they are nor particularly standard in other disciplines either. Can you define art? Music? How about mathematics?

Would you have defined mathematics as "including such topics as numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes," as in Wikipedia? Is this all-encompassing?

Statistics and data analysis have some overlaps. Both involve defining, exploring, cleaning, visualizing, and describing data. Data analyst students study some traditional statistics. Statistics students nowadays study some data analysis.

The father and daughter Larose team have suggested a working distinction of inferential statistics versus data mining (so neither of these is identical to the terms in the first sentence above) as follows:

Inferential statistics involves having a prior hypothesis about a population and testing that hypothesis with a sample from that population. The test may result in statistical significance, even if there is no practical significance.

Data mining does not begin with a prior hypothesis, but rather the analyst "freely trolls through the data for actionable results." (Larose, p. 161)

Larose, D.T. & Larose, C.D. (2015). Data mining and predictive analytics. Wiley. Hoboken, NJ.

Sunday, April 26, 2015

Math is math, except in social science

"So how are the eggs?" "Eggs are eggs." "Eggs are eggs. That is very profound. By the same token, couldn't you say fish is fish? I don't think so." 

So goes a Seinfeld dialog.  Similarly Sigmund Freud is alleged to have said, "Sometimes a cigar is just a cigar," although researchers question whether he really said it.

And by the same logic, math is math, whether it is taught in high school or college, a 2 year community college or a 4 year college, etc. Right?  I found a counter-example of this in statistics.

I recently taught an intro to statistics class in a psychology department using a statistics book for the behavioral sciences.  This book defines the sample standard deviation for descriptive purposes as SX with an N denominator while defining the sample standard deviation for inferential purposes as sX with an N-1 denominator.  I found a second statistics book for behavioral sciences that agrees with this.

Is there a recent textbook in the math or statistics world that defines the sample standard deviation with an N denominator?  I haven't seen it.  And not only will the student of this psychology class find this definition conflicts with the math world, but she will also find (and did find) it conflicts with the Excel world, not only for the Excel standard deviation function but for the Excel statistics Data Analysis add-in functions.

Why can't we all just get along?

Monday, January 28, 2013

We do this in chapter 2; by chapter 8 students have forgotten

Every time I teach statistics, at least one student will ask me a question about chapter 8, whose answer is in chapter 2.

In chapter 2 we calculate the standard deviation of a sample. I invite the class to do one calculation of a sample of three numbers using each step of the formula: √ ∑(xi - ̅x)2 / (n-1). Then I show them how to do it with a single Excel function, =STDEV. At the end of chapter 2, every student can do this in Excel.

We discuss standard deviations in the subsequent chapters, such as when we do Normal distributions, but we don’t calculate any.

In chapter 8 we do hypothesis testing of two samples that are matched pairs, such as a husband’s data and a wife’s data. We take the difference between the two values in each pair, calculate the mean of the differences, and the standard deviation of the differences. And at least one student will ask how to do these standard deviations. The answer is: the same way we did them in chapter 2.

Even if you don't teach statistics, I bet you have a similar experience. Comments?