Is it dishonest to remove outliers and/or transform data?

Hey guys, I have actually never written a blog before so I’m a little bit clueless about all this but I’m going to give it a go and hope for the best!

So many research studies are conducted within Psychology and it is clear to see that the results play a major role in supporting a researcher’s hypothesis. The results need to be as accurate as possible to ensure that the research can be a representative sample of the general public and so that it can be used by scientists, doctors etc.

Once all the data from a study is collected then it is analysed before it is shown to the scientific community. However the results usually have some data that we call outliers. These are results that lie ‘far away’ from the rest of the data. This can be due to systematic error or simply a fault in the procedure. Outliers have also been defined as values that aredubious in the eyes of the researcher” (Dixon, 1950). However, a small number of outliers are expected in a study and I think they are important as they can change the outcomes of our data analysis.

Some people would say that by removing outliers or transforming the findings then it is manipulating the data and so the research shows what the researcher wants them to say; therefore not a true representative study. Sometimes it is because the investigator has a hypothesis to support so they try and make the findings as close as possible to the hypothesis.

It is usually fairly common in clinical psychology research for the researcher to remove patients from the analysis but I think this decision has to be justified and explained. An outlier may not be from an experimental mistake but maybe because the individual is different from others.

To conclude, I think that it is not dishonest to remove outliers even if it transforms the data. By removing the outlier the results will be more accurate and create a more reliable set of data. However, in some case where individual differences should be taken into account, then the outlier should be included.

Hope this makes some sort of sense and please leave me a comment!




Posted on September 30, 2011, in Uncategorized. Bookmark the permalink. 12 Comments.

  1. You make a good argument to why the outliers should be allowed to be removed from the results. However, surely there is a reason to why these anomalies were created therefore should be included. In some research it would be considered unethical to disregard an individual’s results. For example, in some research studies that are considered to be invasive there is strict guidelines which have to be followed which includes that all data collected should be included in the analysis process. Most invasive studies use animals. If an animal has undergone intrusive treatment to gain scientific information, it would be unethical not to include the data even if it is a so called outliers.

    • The anomalies may have occured because of procedural errors, therefore should not be included. In scientific research the data needs to be as accurate and reliable as possible therefore having data that doesnt fit the norm would not be beneficial for the overall outcome of the research. i think it depends on what sort of study it is, i do think some outliers and individual differences should be taken into account. i also do agree with your viewpoint on the invasive studies on animals, as ethics should be taken into account.

      • I understand that some anomalies are caused through procedural errors and would affect the outcomes of the study and results. However, how can you decide what outliers have been created through errors during the study and what are real abnormal occurrences? Surely instead of removing data you should include it and explain what could have caused the outlier in the methodology or summary section. Also a replica study could help establish whether this was an anomaly and therefore didn’t occur again or something more complex. How can you establish reliability if you have taken data out to suggest consistency? This technique can be used negatively as well as improving the chance of creating type 1 errors where the alternative hypothesis is wrongly accepted.

  2. I agree that the removing of outliers can be justified. I think that if the researcher decides to remove the outlier they should be able to justify that decision and do so through reasonable thinking and a lot of consideration.

    I do not think that removing outliers is manipulating the data as outliers are not a true representation of the data anyway. Like you said, sometimes they can be created by experimental error; in this case I do not think it would be manipulative of the results at all.

    However, if people were to remove outliers this could result in unreliability if other psychologists replicate the study or conduct a study similar. If this was the case then the results may not match, creating the issue of external reliability.

  3. I just wanted to say first that i really liked your blog, I thought it was a really interesting subject and it was really clear and easy to read from both sides of the argument 🙂

    But I believe that researchers should not remove outliers and transform their results and that to do so would be dishonest. As you said in your blog, a certain amount of outliers are expected so surely when others are reviewing a certain set of data they will take this fact into consideration; not all research is going to be perfect. Surely then if researchers remove parts of their data that dont fit it does not show the full picture of their research and implies they have something to hide? I think that 100% of the data should be taken into account when looking at someones research as even the smallest change can make a huge difference in the way people see that research.

    • Thank you, i find blogging really difficult!
      I do get what you are saying but sometimes you have to take out outliers if they are going to ruin the results. I think its fine to remove them if they are just down to a procedure error, you are bound to get some results that dont go perfectly. Researchers may not be able to prove their hypothesis then if they have to take into account all the outliers, so then their research would become pointless 🙂

      • “their research would become pointless”

        Research studies in which the null hypothesis are proved are equally as useful as those in which the hypothesis is proved! Whichever way the results come out, the research can’t possibly be pointless: there is always something to be learnt from an experiment; even if this is only (for example) the highlighting of the necessity to use different research methods in order to continue developing an understanding of the issue under investigation.
        Recognising all research as significant and useful is imperative in order to avoid the ‘file drawer’ problem; formally known as publication bias. This refers to the tendency to publish research in which a hypothesis is proven (because let’s admit it, that’s the kind of research we’re excited to read about).
        Publication bias can lead to misunderstanding in psychology. Certain hypothesis can begin to appear heavily supported due to a failure to publish existing research to the contrary.
        In short, no research is pointless research! This is something which perhaps should be emphasised more thoroughly in the study of psychology, in order to discourage practices such as removing outliers which could potentially have provided another level of insight into the behaviour of the participants.

  4. Dear Vicky, I enjoyed reading your blog. I like how you gave a simple but clear explanation on what ‘outliers’ are, and your critical thinking on both the advantages and the disadvantages of removing outliers. You showed good understanding towards the possible cause for outliers, and how some outliers are in fact caused by individual differences. To improve your post, I would suggest you to give an example of which how individual differences may cause outliers in data which researches gathered. Furthermore, you can also enhance your interesting blog with an example on how the removal of ‘outliers’ studies occur in Peer Review when the experts of the field are being biased towards their own findings. I agree on your point of which how it is not dishonest to remove outliers, as sometimes outliers would make it difficult for us to improve our validity towards the area. However, we can view it from an other side arguing it prevent Paradigm Shift. Overall, your post is good and Well Done 🙂

  5. I agree that a small number of outliers are expected in a study as not all research produces the expected or ideal results. However, I believe that even if it is due to a fault in the research or experiment itself then surely it does not have validity, therefore it is not a reliable test in itself?
    This is often why pilot studies are conducted before the real study as it identifies any confounding variables that could interfere with the study, ensuring the real study is as reliable as possible. This means there shouldn’t be any outliers in the research as issues with the study itself would have been dealt with in the pilot study.

  6. I like that you have clearly presented both sides of the argument. I agree that some outliers are caused by individual differences, but it is vital to consider that outliers may also be the product of issues within the study itself and could be an indicator that the study needs revision. So, in this case outliers may be useful as they can help the researcher to improve their original hypothesis/ study.
    Furthermore, i feel that to remove such outliers would be misrepresentation of data as the researcher is ‘picking and choosing’ what to include in their results and therefore it doesn’t accurately resemble the findings of their research. Therefore, i think that it is important that outliers are included in the data because although small and seemingly inconvenient, they may infact be significant to the research topic.

  7. Personally I agree with your view point that outliers should not be removed as I believe it is dishonest to remove a participants results even if the results are due to procedural errors, because the participant still has contributed to the study in some way and we can not delete their results even if they do confound the conclusions gathered because they do not match the original hypothesis.

    One way though in which we could gather data where the outlier is less significant is by gathering more participants data as the more data we gather the bigger trend we will be able to see and the outlier will not affect the total and conclusions as much and will have less significance.

    Overall though i did enjoy reading your blog, it contains some well thought out and balanced arguements.

  8. I enjoyed reading this blog, I liked the fact that you didn’t unecessarily over-complicate your arguements. It was clear and understandable which I think is important :). I also liked how you evaluated the arguement with both points of view. I personally am of the opinion that in most cases outliers should be left in in order to gain a full and accurate perspective of the results. i think the fact that researchers can just remove results because they do not support their hypothesis is part of the reason why many people distrust statistics. There may be some cases in which it is appropriate to remove the outlier but I know that if I were conducting my own research I think I’d prefer to have all of my data collected and reach an honest objective conclusion from ALL of my findings, not just my preferred ones.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: