Just the other day I got a phone call from a reader of our blog. They were wondering about our use of median values as opposed to average values in our post on Land and Acreage for sale in the Augusta, GA market. I considered giving a more detailed description in the post but as it was long enough I decided it was probably unnecessary. In retrospect, it was an error to omit the explanantion of what it is and why I used it. I’m going to use this post to correct that error.
The short answer to what is the median value, is that it is the value that is the exact middle. In statistics, it is the middle number in a given sequence of numbers. It is also sometimes referred to as the 50th percentile. In other words, in any given set of data values, it is the value at which exactly half of the data values are beneath the median value and exactly half of the data values are above the median value.
In a data set with an odd number, it is simply the middle data value. For example, if your data set is 1, 2, 3, then the median value is 2. If you have an even number in your data set such as 1, 2, 3, 4, then the median is calculated by getting the average of the two middlemost data values. For the data set of 1, 2, 3, 4, you take the two middlemost values and add them and then divide by two. Thus you get: (2+3)/2. This gets you the median value here of 2.5.
So why did I use the median value in this post dealing with land sales prices and days on market (DOM) as opposed to averages? Simply because it reduces the likelihood of the information becoming misleading due to outliers or data values that are grossly out of the norm. Let me explain using an example.
Let us suppose that you are opening a new company selling widgets, and you are attempting to price your widgets competitively in your market. You have a data set of the sales prices of some competitiors and what they are selling their widgets for. It is as follows:
|Recent Competitor’s Sales Prices of Widgets|
We’re assuming here all widgets are selling in dollars. If one takes the average, or mean, of these data values, we come up with a widget value of $10.62. Is this really an accurate representation of what widgets on the whole are selling for? Would your price at $10.62 be competitive? Not really. If you priced your widgets at $10.62 you would be more expensive than almost everyone with the exception of the last two in this data set. We are dealing with limited data here as one would in looking at land in an MLS. Those final two may be selling gold widgets or diamond encrusted widgets for all we know, and that may explain their drastic difference in price from the others in the data set. The reason the mean or average is skewed is because of the two outliers of 24 and 39. It raises the mean value above the range of what the great majority of widgets are selling for in this market.
What about the median? What is it’s value in this set? In order to do this one must order the data values.
As this data set is an even number, to calculate the median one must average the two middlemost values. The two middlemost values are $7 and $7. Therefore the median is $7. The mean value of $10.62 is nearly 33% higher than the median value. I would argue that the median value of $7 is a much more accurate representation of the value of a widget in this example market than the $10.62 mean value. At $7 you would have a very competitively priced widget. At $10.62 you are clearly outpriced by almost all of your competition. The median is a statistical value that is designed very specifically to counter the effects of outliers such as the $24 and $39 widgets in our example.
This dampening effect of outliers is why I chose to use the median value. It is also why you see in most statistical analysis’ they use it as well to define data sets rather than averages. Averages can be very deceiving in the right circumstances. I will always try to portray the data and information in this blog in such a fashion as to be clear and accurate as much as possible.
This of course reminds me why I am such a skeptic of studies and statistics. By manipulating the sample size, the population size and which measures you use, one can make statistics say anything you want. Count me as a disciple of the school of thought popularized by Mark Twain and often attributed to Benjamin Disraeli, “There are three kinds of lies: lies, damned lies, and statistics.”