Continued from Part I
In the previous part of this post, we have seen how God carefully distinguishes between ‘certain’ knowledge derived directly from divine sources and ‘human predictions’ speculated from skills and knowledge made available to humans in a general way. It is clear that claims, results, and theories derived from human sources can not be raised to the status of ‘certain knowledge’, they must retain a seed of doubt at their core.
This part of the post attempts to present the basis for the seeds of doubt (the hows and whys) present in essentially every field of human knowledge no matter how scientific or unscientific it is assumed to be. Before focusing on example fields themselves, below is a brief and accessible description of the issue of doubt in scientific research (which is the vehicle for deriving knowledge in any respectable field).
Doubt vs certainty in scientific research
Actually, the issue of doubt vs certainty in scientific research is closely tied to the issue of a study’s claim to be aunthentically scientific. That is, if a research study cannot provide substantiable details of the level of doubt/certainty in its results, it is rejected outrightly as being inauthentic. A study which does provide the requisite details but it shows lack of care and vigor in the study’s procedures towards ensuring a high level of certainty of results, is criticized as being substandard. Why is this issue at the very heart of progress in research? Let’s see.
The logic of a scientific experiment:
To prove that one thing (a) effects or causes another thing (b) essentially means proving that no other (c) factor was also affecting the same thing (b) at the same time. If a person claims to be worsened by a doctor’s prescription, it’s only just to expect to be proven that there was no other cause (other than the prescription) working at the same time that may have lead to the worsening. The doctor should be held responsible, if only his prescription was the relevant ‘worsening’ factor and no other factors were found. Or, the extent of harm from the prescription should be ascertained, with doctor eligible to pay for losses only in proportion to the harm caused, compared with other harming factors.
The same logic holds in scientific studies. Let us suppose a pharmaceutical company gives a new heart disease medicine to 100 patients, and continues a 100 other patients on a pre-existing treatment. After 6 months, it finds average improvement in the heart condition for the new medicine group, while the old medicine group shows no change in average. The question is: were there factors other than the new medicine that could have lead to the change? Were the 100 of one group similar to the 100 of the other group in the intensity of their disease, ages, genders, previous treatments, other health conditions, diet and lifestyle care, psychological stress and social support, financial security etc, etc ___ all factors that affect heart disease?
If the selection of patients was not done carefully, then there might well be pre-existing differences in the group also contributing to the different levels of disease condition after six months. For example, if the new medicine group were younger on average than the old medicine group, or if they were financially better off than the other group, or their previous histories of treatment and response to heart disease were superior…, any of these factors could have been the only factor leading to observed improvement (or it could have contributed to the improvement in addition to the new medicine).
Differences may also exist in the researchers’ handling of the study procedures. Were same doctors seeing both the patients? If the doctor knew which patient was receiving which medication, may be something in their attitude (for instance extra optimism when handling new medicine patients) could have also affected results. Or, the old medicine group may have found out about another group receiving newer and presumably advanced medication (hearsay works everywhere) and might feel discriminated. This will certainly affect their mental and behavioral attitudes towards their regimen affecting disease condition six months later.
A measure of doubt:
Let us suppose the two groups were more or less equal (all 200 being different, unique individuals, they can never be exactly equal). Let us also suppose that at the end of six months a 5% difference in the average level of their heart disease condition had been found. The question is, is this difference big enough to be attributed to new/old medicine factor, since other vital differences could not be ruled out 100%? If I claim ‘yes, this difference is big enough to come from the new/old medicine factor’, how certain should I be in my claim OR how much doubt should I acknowledge in my claim?
Scientists rely on statistics and probability theory to calculate levels of certainty vs doubt in their claims___subject matters beyond the scope of the general reader. The levels are reported in the form of ‘percentages’. For example a 95% confidence interval means that their is a 95% chance (vs. a 5% doubt) that the observed result is strong enough to be attributed to the factor-in-question.
The above scenario is one example of how the issue of doubt concerns itself in scientific research. In my survey of sciences below, other angles may be highlighted as I have already clarified the most common scenario of doubt. However-which way, it will be clear that the germ of doubt cannot be ruled out completely from the sciences. One may say, that this is well applicable to fields involving humans as ultimate subjects, but what about fields such as physics dealing totally in lifeless objects? Well, the nature of doubt in those fields may be different, but that it certaintly exists will be obvious in a short while.
A survey of example fields, in ascending order of scientific authenticity of research:
Dream interpretation, astrology, hand-reading, psychic practices, and alternative medicine are all parasciences. This is because the knowledge or skills claimed by their practitioners do not pass the test of scientific scrutiny. Practitioners of such fields either do not conduct scientific research at all; their ‘knowledge’ and ‘art’ is based purely on theory, arbitrary observations of individual cases, and history of practice. If they do, their procedures of research are not clearly arranged to rule out unnecessary doubt as much as possible. Or, they do not take the care to report procedures clearly enough to be examined by other scientists. Also, when truely scientific studies are conducted to examine their predictions, results are often against rather than in favor.
Social sciences such as economics, sociology, and anthropology often concern with factors of such large scale that the scale of doubt is proportionally large as well. That is so because they are commonly concerned with effects of factors on society / population at large, rather than on individuals. For instance, economists may study how recruiting policy effects turnover of employees in a certain class of organizations. Or they may attempt to study the fluctuations in stock exchange rates in a certain market. Anthropologists concern themselves with the human culture at large. They may investigate effects on urban survival, human mobility patterns, or domestication of landscapes. Sociologists, may examine effects of instituitional fundings on inequality in educational opportunities. Or they may see how leadership styles affect gender discrimination in organizations.
Studying something ‘big’:
Such ‘macro’-level factors (‘variables’ in research terms’) are difficult to control in a research. For instance, how would you attempt to equalize the characteristics of organizations that affect gender discrimination in addition to the variable-in-question. Many times the context of study is so large, only a single case is included in the study, for instance a study of urban survival in Tokyo. In such cases, effect of, say, inflation, may be observed on survival in the city over time. Let us suppose that inflation in the city makes survival difficult in the city over 8 months. The question is, how can one control other factors also affecting survival in a real population of a city, to rule out the doubt that they may not have affected survival, other than inflation? It is true that experts have attempted to design studies to weed out some of the problems, but each design is riddled with it’s own set of doubt-casting factors.
Due to this impossibility of exerting too much control of procedures, their academics and professionals in these fields accept a 10% level of uncertainty in their scientific claims. This only refers to studies performed in a ‘scientific’ way; studies in which conclusions are based merely on an analysis of descriptive observations, interviews, or case studies are rejected by critics as not being truly authentic.
These large scale studies have another problem. To what extent, conclusions based on a city in Japan hold true for a city in a different part of the world, even in the same country? A huge probability exists that any two given cities are different enough (especially when they belong to different cultures, climates, and ideologies) that things operate differently in them even when the same variables are involved.
Sciences such as psychology and management are more concerned with individuals’ behaviors. At this level, at least control of conditions is better; still as in the example in the last post, control cannot be too good. That is why, academics of such fields have settled on allowing a 5% level of uncertainty of results. Doubt cannot be ruled out but it can be lessenned; other rooms of doubt may still remain.
The people in the study:
One major issue is of ‘sampling’: whom did you include in your study? Just like the example of Tokyo, the question remains whether findings based on a small group of people can be applied to the larger population from which the ‘sample’ was selected. Ideally, a large number of people should be selected and should cover all the various segments of the whole population. For instance, there should be enough composition of genders, social status, and ages etc to reflect the overall composition of the population at large. However, due to difficulties of funding, and time and effort, researchers typically select people from a convenient location such as a university, shopping mall, organization, or airport lounge. As such, you essentially end up making claims based on actual observation of a small number of people of only a certain type of people who may be very different on the whole from the overall population. Some critics have even said that people who agree to participate in such research may be fundametnally different from people who do not volunteer or who refuse to participate. In that case, we will never find out the ‘true’ effect of ‘a’ on ‘b’ as our sample will always be of a certain type, never including anyone from the other type.
Another major issue is that, from the point of view of the participating person, the situation of research activity is often significantly different from the same activity performed in daily life. People hold myriad of opinions, but do they reflect them ‘as it is’ in a survey? Just so, we perform certain tasks in real life, but when we do so at a special place (in a psychology lab), or at a special time (on the request of the manager-researcher) when we also know that we are being observed, is it really the same? Many times such mental variables are studied that the tasks created to investigate them have no connection with real life. In real life, our problem solving skills are tested in real life challenges, risks, troubles, and dilemmas. In research, we are presented case-like or riddle/puzzle-like situations and are supposed to answer to related questions. Again, is it the same? Do conjectures derived from observations so artificially removed from the real, thriving, kaleidoscopic, and very ‘personallly’ experienced life outside a lab or office can really help us in that life?
I am going to deliberately ignore fields such as zoology, botany, and microbiology at this level. The ultimate purpose of science is to advance the human kind; all sciences are a way towards this end (just as God proclaims that the whole world has been created to serve humans ↓1). Hence I consider only the fields more directly in connection with us: medicine, pharmacology, neuroscience; and I consider them as a single overall ‘human organism’ science.
When such sciences study internal body processes (such as how a certain chemical interacts with the human blood) in an experimental way, the level of doubt is much reduced and the accepted cut-off is 99% confidence level. In general, we see that the more we move towards ‘inanimate’ subjects (blood by itself is inanimate), the less doubt is allowed for scientific authenticity since a superior degree of control of procedures is naturally possible. However, since the ultimate goal in all sciences is to apply their technology on living breathing humans, some of the same problems already mentioned arise. And there are more.
Is the ‘effect’ as large as life?
We should keep in mind that the only truly scientifically authentic method of research: the controlled experiment. I have already share a typical example of it above, in the first section of this post. Confidence in the findings of any such study is enhanced as long as the two groups in the study are identical to each other in all respects other than the factor-in-question (the new drug in that example), as long as the procedures of the study simulate how things happen in the real world, and as long as the people selected to study are similar to the much larger group of people to whome the findings will be applied. Despite all this care, a common problem remains in many experiments and that is of ‘effect size’.
In our main example, a 5% difference was found in the heart level condition of the two groups after six months. The question is, even if this difference truly resulted from the new drug, is it a big enough difference for practical reasons? Many a times, good control and large sample sizes lead to statistical significance (i.e. the difference is deemed big enough to leave only the accepted level of doubt in attributing it to the key variable). But is that difference really meaningful in the real world? For example, here, is this difference enough to increase the comfort and living of the people taking the medicine, is it worth the cost of the new medicine, worth risking the side effects of this medicine? What does it mean for the future of the patient? Will it lead to better prognosis in the long run, prevent fatalities from the disease or increase life span? What if the new drug leads to complications that build up over a span of years or even a decade? Speculations upon speculations…
Are all studies truly experimental?
Over the course of my general readings through the years, I was surprised to find out that, just like social and behavioral sciences, medical sciences too rely on non-experimental research. The most common type of non-experimental research both prevalent and widely accepted in the academic fields is ‘correlational research’. In such research rather than attempting to demonstrate the effect of ‘a’ on ‘b’, the two factors are merely observed to be occuring together in some pattern. For instance, instead of actually controlling a sample’s diet in an experimental way, researchers merely guage the type and amount of diet nutrients actually being taken by the sample. They also measure, say, the weight or body mass of their sample and see if the two variables (nutrients and body mass) are associated: That is, do cases with high amoutns of fat-promoting nutrients in their diet intake also exhibit a higher weight (and vice versa)?
All that such research establishes is that two factors are ‘seen’ to be going along together. Where one is at an increased level, the other is also increased (or it could be, where one is increased, the other is decreased; this is also a pattern of correlation). Such research cannot explain why these factors are seen as varying together. That is, you can never claim: the nutrient type is responsible for or causes the weight levels in the people. The reason is obvious, so many other influencing factors out there were not controlled. Even the key variable in question (nutrients, here) was not controlled by the researcher. The subjects in the sample did not all take the same food in the same fashion at the same place; merely their life reports were taken. How can we even assume in that case that nutrients are causing the body mass variation observed across the sample?
It’s like, if you notice that during the months of May and June, rubber in household products melt easily and our skin frequently develops rashes, should you conclude that particles from the melting rubber are leading to the rashes? NO! There are so many other possibilities: Each of these two things could have their own separate causes, they are just occuring together for some reason. Or, both of these could be the ‘effects’ of a third cause; which is obviously in this case the high heat of these months. (Althouth, results of such research are also reported in terms of confidence vs uncertainty, but that only amounts to: “is the degree of association between the two a real correlation or merely a coincidence?’)
Now it turns out that some of the famous ‘facts’ of our life-time such as ‘cholesterol is linked to heart disease’ and ‘use of sun screen reduces risk of skin cancer’ are merely ‘correlational’ rather than ‘causal’ facts↓2. I will take up here in detail two cases which interest me.
Cholesterol–the innocent culprit?
Cholesterol is always observed at the site of damaged veins in heart disease, nobody has ever (and could never) induced people into high cholesterol vs low cholesterol levels and, after controlling all other related factors, established for sure that the high cholesterol group developed the problems associated with heart disease (angina, aneurysms, and heart attacks). It’s a combination of scientific diffidence, media sensationalism, and pharmacological marketing which has firmly grounded the mere ‘association’ as a ‘fact-beyond-doubt’ in both the practitioners’ and the public’s minds.
In fact, all kinds of myriad and mutually contradictory evidences exist about cholesterol. There are many academic sources for the topic↓3 but, as an example the following paragraph from a research project↓4 designed to demystify the cholesterol myth should be sufficient (I have converted the paragraph in numbered items for clearer presentation:
There are also many situations and scenarios that cannot be explained by the cholesterol-theory.
Familial hypercholesterolemia (FH) is a genetic disease causing severely elevated LDL levels (in excess of 500 mg/dL). An FH patient would be expected to be critically at risk of heart disease, and yet there are many cases of entire families with FH never suffering a single heart attack.
Other examples logically refuting the hypothesis include tribes of nomadic peoples in Africa whose diets consist principally of fresh red meat, high in cholesterol, who have no history of heart attacks whatsoever,
and the recent Vytorin study, which showed that despite a particular cholesterol-reducing drug regiment, patients on the drug developed atherosclerosis at double the rate. (p. 5)
Another area of biological research which is of considerable awe, wonder and amazement in the educated section of the masses is neuroscience: the one scientific field that attempts to cut across all the essential fields of knowledge and to decipher the riddles of how the mind, the body and the world relate to each other.
Areas of the brain and their functions–certainly?
Neuroscientists have progressed considerably in their knowledge of which psychological functions (such as mental activity, memories, emotions and social aptitude) are ‘performed’ by which brain areas. The typical methodology has been to engage subjects in tasks utilizing the function-of-interest and to simultaneously record brain activity patterns through one of prevalent brain-charting technologies of the time. They then try to correlate which brain areas were most active with the persons’ engagement in the tasks. Not only this approach is obviously correlational, the modern day’s ease of recording brain functioning given a multitude of technologies now available is making many scientists too relaxed in their approach to such research↓5.
What happens before stastically calculating the correlation is, that the scientists first look for those brain areas that are showing high activity during the task. They then correlate those same areas with the activity measure in the same subjects. The problem is that this kind of pre-selection can be highly misleading given the correlational nature of the research. That certain brain areas are seemingly active in that sample does not necessarily show a true relation: it could be an artefact of the observed subjects’ individualities (note that the sample sizes in such studies are usually very small, such as within thirties. It could be a ‘third cause’ scenario’; the engaging activity is leading to a process in the brain (not detected by the recording method in the study) which could in turn be associated with certain brain areas’ activity in a complicated fashion. Note that, a lot of brain areas are active during any task; scientists here select only the highest activity ones and then uNse only those to measure the correlation. Now it is quite possible that the high activity of those areas is not the underlying mechanism directly managing the person’s behavior. Instead, a complicated pattern of high-moderate-low activitis in a string of areas could be responsible __ one can’t say.
The Neuroskeptic blog sums up the problem better:
The essence of the main argument is quite simple: if you take a set of numbers, then pick out some of the highest ones, and then take the average of the numbers you picked, the average will tend to be high. This should be no surprise, because you specifically picked out the high numbers. However, if for some reason you forgot or overlooked the fact that you had picked out the high numbers, you might think that your high average was an interesting discovery.
Here are the comments of the critics (referenced in footnote 5) who orginally brought the problem to light:
any measures obtained from such [an] … analysis are biased and untrustworthy.
My survey is taking longer than expected. I must continue the rest in a Part III. I will InshaAllah finish the survey with physical sciences and mathematics; ending with a synopsis providing perspective on what we are learning from the survey.
Till then, fi amana-Allah.
2. One of the best resources on this topic is a book by Robin Baker called Fragile Science: the reality behind the headlines, 2002, Pan Macmillan.
3. In addition to many literature reviews available fully or partially on the internet, any current textbook on health psychology should also suffice providing a multitude of cross references.
4. Letourneux, J., Ryder, M., Stone, C., and Waring, C. 2008. Mythbusters: Cholesterol. An interactive qualifying project submitted to the Worcester Polytechnic Institute. Found at http://www.wpi.edu/Pubs/E-project/Available/E-project-050508-202532/unrestricted/Mythbusters_Cholesterol_IQP.pdf
5. These linked two research articles investigate the issue: http://www.pashler.com/Articles/Vul_etal_2008inpress.pdf and http://www.nature.com/neuro/journal/v12/n5/abs/nn.2303.html. For a relatively accessible but still technical account of the two, interested readers should go to these posts at the Neurocritic blog and the Neuroskeptic blog, respectively.
Related posts from this blog:
SCIENCE|RELIGION: Observations of a scientist upon science and reality