
The pitfalls of statistics
Description
Book Introduction
Statistics are everywhere—in opinion polls, the stock market, earthquake predictions, weather forecasts, public health, sports—and they can help us understand, but they can also deceive and confuse us. It delves into the statistical misunderstandings inherent in the most representative paradoxes of mathematics, offering the wisdom of correct statistical analysis and understanding. Incorrect statistical analysis and interpretation can lead to more than just misunderstandings: inaccurate medical diagnoses, failure to predict large-scale earthquakes, worsening social inequality, and flawed policy decisions. There are right and wrong ways to look at statistical numbers. This book clearly explains to readers which side is right and which side is wrong. |
- You can preview some of the book's contents.
Preview
index
Chapter 1.
Are you normal? Hint: No.
__existence···arm length
__why?
__Distribution Comparison
__How Gaussian is it?
The Myth of the 'Average Man'
__Big Five
__We are all equally abnormal
__But some are more equal than others
__Sources and related literature
Chapter 2.
Relay races and turnstiles
__course size
__Removing bias from data
__Where is my train?
__Are you popular? Hint: No
__Finding super spreaders
__Road Rage
__If you are just visiting once
__Recidivism rate
The paradox of the __inspector is everywhere.
__Sources and related literature
Chapter 3.
Defy tradition and save the world.
__family size
The Great Depression and the Baby Boom
__More recently
__Preston's Paradox
__If you have one less child
__In the long run
__Reality is
__today
__Sources and related literature
Chapter 4.
Extremes, Outliers, and GOATs
__exception
__Birth weight is Gaussian
__Weight gain simulation
__running speed
__Chess Rankings
__Best ever
__What should we do?
__Sources and related literature
Chapter 5.
Better than new
__electric bulb
__Even now, soon
__Survival period of cancer patients
__Life expectancy at birth
__child mortality rate
__Immortal Swede
__Sources and related literature
Chapter 6.
Jump to conclusions
__Mathematics and oral skills
__Elite University
__The less excellent, the greater the correlation
__Secondary A University
Buckson's Paradox in Hospital Data
__Buckson and COVID-19
__Buckson and Psychology
__Buckson and us
__Sources and related literature
Chapter 7.
Causality, Conflict, and Chaos
__3 million infant data cannot be wrong
__Other groups
__The End of Paradox
__The Twin Paradox
__The Paradox of Obesity
__Buckson's Toaster
__Causal Diagram
__Sources and related literature
Chapter 8.
The long tail of disaster
__Distribution of disasters
__earthquake
__solar flare
__lunar crater
__asteroid
__Origin of the long-tailed distribution
__Stock market crash
__Black Swan and Gray Swan
__The World of Long-Tailed Distributions
__Sources and related literature
Chapter 9.
Fairness and Error
__medical examination
__higher prevalence
__higher specificity
__bad medicine
__Drunk driving
__Vaccine effectiveness
__Crime Prediction
__Group comparison
__Fairness is difficult to define
__Fairness is difficult to achieve
__All About Base Rate
__Sources and related literature
Chapter 10.
Penguins, Cynics, and Paradoxes
__Old optimist, young pessimist
__real wages
__Penguins
__Simpson's Prescription
__Do vaccines work? Hint: Yes.
__Re-discussion of the truth
__Open data, open discussion
__Sources and related literature
Chapter 11.
Change your mind
__Old racists?
__young feminists
__A remarkable decline in homophobia
__What happened in 1990?
__Is it a group effect or a period effect?
__Over button window
__Sources and related literature
Chapter 12.
Following the overturn window
__Old conservative, young liberal?
__What does 'conservative' mean?
__How can this be?
__The center is not stationary
__Everything is relative
__Have we become more polarized?
__Following the overturn
__Sources and related literature
__Appendix: 15 Questions
Epilogue
Are you normal? Hint: No.
__existence···arm length
__why?
__Distribution Comparison
__How Gaussian is it?
The Myth of the 'Average Man'
__Big Five
__We are all equally abnormal
__But some are more equal than others
__Sources and related literature
Chapter 2.
Relay races and turnstiles
__course size
__Removing bias from data
__Where is my train?
__Are you popular? Hint: No
__Finding super spreaders
__Road Rage
__If you are just visiting once
__Recidivism rate
The paradox of the __inspector is everywhere.
__Sources and related literature
Chapter 3.
Defy tradition and save the world.
__family size
The Great Depression and the Baby Boom
__More recently
__Preston's Paradox
__If you have one less child
__In the long run
__Reality is
__today
__Sources and related literature
Chapter 4.
Extremes, Outliers, and GOATs
__exception
__Birth weight is Gaussian
__Weight gain simulation
__running speed
__Chess Rankings
__Best ever
__What should we do?
__Sources and related literature
Chapter 5.
Better than new
__electric bulb
__Even now, soon
__Survival period of cancer patients
__Life expectancy at birth
__child mortality rate
__Immortal Swede
__Sources and related literature
Chapter 6.
Jump to conclusions
__Mathematics and oral skills
__Elite University
__The less excellent, the greater the correlation
__Secondary A University
Buckson's Paradox in Hospital Data
__Buckson and COVID-19
__Buckson and Psychology
__Buckson and us
__Sources and related literature
Chapter 7.
Causality, Conflict, and Chaos
__3 million infant data cannot be wrong
__Other groups
__The End of Paradox
__The Twin Paradox
__The Paradox of Obesity
__Buckson's Toaster
__Causal Diagram
__Sources and related literature
Chapter 8.
The long tail of disaster
__Distribution of disasters
__earthquake
__solar flare
__lunar crater
__asteroid
__Origin of the long-tailed distribution
__Stock market crash
__Black Swan and Gray Swan
__The World of Long-Tailed Distributions
__Sources and related literature
Chapter 9.
Fairness and Error
__medical examination
__higher prevalence
__higher specificity
__bad medicine
__Drunk driving
__Vaccine effectiveness
__Crime Prediction
__Group comparison
__Fairness is difficult to define
__Fairness is difficult to achieve
__All About Base Rate
__Sources and related literature
Chapter 10.
Penguins, Cynics, and Paradoxes
__Old optimist, young pessimist
__real wages
__Penguins
__Simpson's Prescription
__Do vaccines work? Hint: Yes.
__Re-discussion of the truth
__Open data, open discussion
__Sources and related literature
Chapter 11.
Change your mind
__Old racists?
__young feminists
__A remarkable decline in homophobia
__What happened in 1990?
__Is it a group effect or a period effect?
__Over button window
__Sources and related literature
Chapter 12.
Following the overturn window
__Old conservative, young liberal?
__What does 'conservative' mean?
__How can this be?
__The center is not stationary
__Everything is relative
__Have we become more polarized?
__Following the overturn
__Sources and related literature
__Appendix: 15 Questions
Epilogue
Publisher's Review
Some of the cases covered in this book are based on previously published research, while others are my own observations and explorations of the data.
Rather than simply reporting previous research results or copying figures, we followed the analysis and created our own figures. In some cases, the original work failed validation, and we have excluded such cases from this book.
In some cases, I was able to perform the same analysis with more recent data.
This update also brought some unexpected enlightenment.
For example, the 'low birth weight paradox' was first observed in the 1970s and persisted until the 1990s, but has disappeared in recent data.
All the work presented in this book is based on tools and methodologies from the field of reproducible science.
I use Jupyter notebooks to combine text, computer code, and results into a single document.
These documents are organized using a version control system to ensure consistency and accuracy.
Ultimately, I wrote about 6000 lines of Python code using reliable open source libraries like NumPy, SciPy, and pandas.
Of course, my code may have bugs, but I've tested it to minimize the risk of errors that seriously affect your results.
My Jupyter notebooks are publicly available online, so anyone can easily reproduce the analyses I run.
Author's Note
We can use data to answer questions and resolve disputes.
Data can be used to make better decisions, but it's not always easy.
One problem is that our intuitions about probability are sometimes dangerously misleading.
For example, in October 2021, a guest on a popular podcast claimed with some concern that “more than 70% of COVID-19 deaths in the UK have been in people who have been vaccinated.”
His argument was correct.
The figures came from a report published by Public Health England, which uses reliable national statistics.
But his suggestion that vaccines are useless or actually harmful is wrong.
As we will see in Chapter 9, we can use data from the same report to calculate the vaccine's effectiveness and estimate how many lives it saved.
According to him, the vaccine was more than 80% effective in preventing deaths and saved more than 7,000 lives out of a population of 48 million over a four-week period.
If we are given the opportunity to save 7,000 lives a month, we should take that opportunity.
The mistake the podcast guest made is called the "base rate fallacy," and it's a mistake anyone can easily make.
In this book, we will see examples from healthcare, the judicial system, and other related fields where probabilistic decisions can determine health, liberty, or even life.
Translator's Note
“There are three kinds of lies in the world.
“Lies, damned lies, and statistics.” This quote by Benjamin Disraeli, made famous by Mark Twain, highlights how statistics are often misused or abused in real life.
This can also be interpreted as a warning that we must be wary of the political intentions lurking behind statistics, and therefore, when looking at statistical data, we must not be blindly swayed by the apparent results and interpretations, but rather conduct thorough and rigorous 'fact-checking'.
The COVID-19 pandemic and the statistical debate surrounding it that has swept the world in recent years have brought to mind the cynical warning of “lies, damned lies, and statistics.”
A podcast, notorious for being a hotbed of misinformation during the pandemic but also with a huge social influence due to its large subscriber base, made the claim that “vaccinated people under 60 in the UK have a mortality rate twice that of unvaccinated people of the same age”, and it has become a global debate.
The person who made the claim was a reporter for the New York Times at the time, and the fact that the basis for the claim was official data from the UK's Office for National Statistics caused an even bigger stir.
The claim, based on seemingly unbiased official data, has become a formidable weapon for vaccine rejecters and pandemic conspiracy theorists.
The data and graphs, which appeared to accurately reflect the official data of the National Statistical Office, had two fatal problems.
First, the reporter who claimed that vaccines actually increase mortality rates had no knowledge or expertise to properly interpret statistical data.
Second, they selected only the age groups and time intervals that matched their claims and ignored the data that did not.
So, in reality, the data from the National Statistical Office, which was supposed to prove the effectiveness of the vaccine, was distorted for the opposite purpose.
When you look at the data by dividing it by individual age groups or genders, it shows a decreasing trend—or an increasing trend—but when you look at the data by all age groups and genders together, it shows the opposite—an increasing trend—or a decreasing trend. This so-called “Simpson’s Paradox” played a role in this journalist’s vaccine risk theory.
Chapter 10 of this book, "Penguins, Pessimists, and Paradoxes," clearly and easily exposes the flaws in the above argument, which have helped to worsen the pandemic situation.
The puzzle of the measurements of Antarctic penguins and the question of whether we invariably become more cynical as we age are two related examples that clearly illustrate this flaw.
Author Alan excels at making difficult—or seemingly difficult—statistics easy to understand.
This book is a testament to his talent.
The examples he presents are so commonplace around us that it makes us believe that a proper understanding of statistics can help us better understand politics, economics, society, and even our own minds.
Why do everyone who passes me in a race seem incredibly faster, while those I pass seem significantly slower? Why are earthquakes and natural disasters so difficult to predict? Why do people diagnosed with the same type of cancer have different survival times? Why are the statements "I'm average" or "I'm normal" wrong? Why do I think anyone who drives slower than me is stupid, and anyone who drives faster is crazy?
Korean society is deeply troubled these days by its extremely low birth rate.
The term 'population cliff' is now being passed around like a buzzword.
Chapter 3 of this book, “Reject Tradition and Save the World,” is particularly timely in that respect.
This is a chapter that I would especially recommend to policy makers.
The author analyzes the impact of China's "one family, one child" policy from a statistical perspective, while also suggesting several variables that should be taken into account in developing an appropriate birth policy.
If we look at the issue from a more discerning statistical perspective, we might be able to come up with more groundbreaking policies and ideas to overcome the demographic cliff crisis.
Translation, on the other hand, is also a great learning opportunity.
I think this book played that role especially for me.
I often felt anxious to know what would happen next.
So I was able to translate more enjoyably.
I hope that readers who encounter this book will also experience that kind of joy.
Rather than simply reporting previous research results or copying figures, we followed the analysis and created our own figures. In some cases, the original work failed validation, and we have excluded such cases from this book.
In some cases, I was able to perform the same analysis with more recent data.
This update also brought some unexpected enlightenment.
For example, the 'low birth weight paradox' was first observed in the 1970s and persisted until the 1990s, but has disappeared in recent data.
All the work presented in this book is based on tools and methodologies from the field of reproducible science.
I use Jupyter notebooks to combine text, computer code, and results into a single document.
These documents are organized using a version control system to ensure consistency and accuracy.
Ultimately, I wrote about 6000 lines of Python code using reliable open source libraries like NumPy, SciPy, and pandas.
Of course, my code may have bugs, but I've tested it to minimize the risk of errors that seriously affect your results.
My Jupyter notebooks are publicly available online, so anyone can easily reproduce the analyses I run.
Author's Note
We can use data to answer questions and resolve disputes.
Data can be used to make better decisions, but it's not always easy.
One problem is that our intuitions about probability are sometimes dangerously misleading.
For example, in October 2021, a guest on a popular podcast claimed with some concern that “more than 70% of COVID-19 deaths in the UK have been in people who have been vaccinated.”
His argument was correct.
The figures came from a report published by Public Health England, which uses reliable national statistics.
But his suggestion that vaccines are useless or actually harmful is wrong.
As we will see in Chapter 9, we can use data from the same report to calculate the vaccine's effectiveness and estimate how many lives it saved.
According to him, the vaccine was more than 80% effective in preventing deaths and saved more than 7,000 lives out of a population of 48 million over a four-week period.
If we are given the opportunity to save 7,000 lives a month, we should take that opportunity.
The mistake the podcast guest made is called the "base rate fallacy," and it's a mistake anyone can easily make.
In this book, we will see examples from healthcare, the judicial system, and other related fields where probabilistic decisions can determine health, liberty, or even life.
Translator's Note
“There are three kinds of lies in the world.
“Lies, damned lies, and statistics.” This quote by Benjamin Disraeli, made famous by Mark Twain, highlights how statistics are often misused or abused in real life.
This can also be interpreted as a warning that we must be wary of the political intentions lurking behind statistics, and therefore, when looking at statistical data, we must not be blindly swayed by the apparent results and interpretations, but rather conduct thorough and rigorous 'fact-checking'.
The COVID-19 pandemic and the statistical debate surrounding it that has swept the world in recent years have brought to mind the cynical warning of “lies, damned lies, and statistics.”
A podcast, notorious for being a hotbed of misinformation during the pandemic but also with a huge social influence due to its large subscriber base, made the claim that “vaccinated people under 60 in the UK have a mortality rate twice that of unvaccinated people of the same age”, and it has become a global debate.
The person who made the claim was a reporter for the New York Times at the time, and the fact that the basis for the claim was official data from the UK's Office for National Statistics caused an even bigger stir.
The claim, based on seemingly unbiased official data, has become a formidable weapon for vaccine rejecters and pandemic conspiracy theorists.
The data and graphs, which appeared to accurately reflect the official data of the National Statistical Office, had two fatal problems.
First, the reporter who claimed that vaccines actually increase mortality rates had no knowledge or expertise to properly interpret statistical data.
Second, they selected only the age groups and time intervals that matched their claims and ignored the data that did not.
So, in reality, the data from the National Statistical Office, which was supposed to prove the effectiveness of the vaccine, was distorted for the opposite purpose.
When you look at the data by dividing it by individual age groups or genders, it shows a decreasing trend—or an increasing trend—but when you look at the data by all age groups and genders together, it shows the opposite—an increasing trend—or a decreasing trend. This so-called “Simpson’s Paradox” played a role in this journalist’s vaccine risk theory.
Chapter 10 of this book, "Penguins, Pessimists, and Paradoxes," clearly and easily exposes the flaws in the above argument, which have helped to worsen the pandemic situation.
The puzzle of the measurements of Antarctic penguins and the question of whether we invariably become more cynical as we age are two related examples that clearly illustrate this flaw.
Author Alan excels at making difficult—or seemingly difficult—statistics easy to understand.
This book is a testament to his talent.
The examples he presents are so commonplace around us that it makes us believe that a proper understanding of statistics can help us better understand politics, economics, society, and even our own minds.
Why do everyone who passes me in a race seem incredibly faster, while those I pass seem significantly slower? Why are earthquakes and natural disasters so difficult to predict? Why do people diagnosed with the same type of cancer have different survival times? Why are the statements "I'm average" or "I'm normal" wrong? Why do I think anyone who drives slower than me is stupid, and anyone who drives faster is crazy?
Korean society is deeply troubled these days by its extremely low birth rate.
The term 'population cliff' is now being passed around like a buzzword.
Chapter 3 of this book, “Reject Tradition and Save the World,” is particularly timely in that respect.
This is a chapter that I would especially recommend to policy makers.
The author analyzes the impact of China's "one family, one child" policy from a statistical perspective, while also suggesting several variables that should be taken into account in developing an appropriate birth policy.
If we look at the issue from a more discerning statistical perspective, we might be able to come up with more groundbreaking policies and ideas to overcome the demographic cliff crisis.
Translation, on the other hand, is also a great learning opportunity.
I think this book played that role especially for me.
I often felt anxious to know what would happen next.
So I was able to translate more enjoyably.
I hope that readers who encounter this book will also experience that kind of joy.
GOODS SPECIFICS
- Date of issue: April 26, 2024
- Page count, weight, size: 328 pages | 728g | 152*228*19mm
- ISBN13: 9791161758343
- ISBN10: 1161758348
You may also like
카테고리
korean
korean