false
Catalog
2018 Orthopaedic Sports Medicine Review Course Onl ...
Sports Medicine Research
Sports Medicine Research
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Again, I'm Bruce Miller. I serve as one of the co-chairs with Dr. Kading. I'm at Ann Arbor at the University of Michigan. I'm going to cover sports medicine research, and listen, many or most of us are not actively engaged in research. However, I suspect every single person in this room is a consumer of orthopedic literature. You should be reading journal articles every now and then if you are. This information will make you a more sophisticated consumer. There will be two to five questions on this material on the test. If you pay attention, you'll get them. Put your pens down for a bit, because the early stuff is going to be conceptual stuff. Wrap your head around it. I will highlight the areas that I think are going to be testable material, and we'll have some questions at the end. I have no disclosures. This is the outline for this session. I happen to do research, but whether I'm doing research or consuming research in the form of reading an article, there are a number of things that I go through as a checklist, and they're included here. We'll go through them one by one. One is the concept of the expected size of effect, whether it's a research project or a surgical intervention. We all want our work to be meaningful. We want our work to be the shot that's heard around the world. The question is, is your intervention strong enough to have a chance of having a detectable effect, yes or no? Even if it is detectable, is it clinically relevant? I want to spend a minute talking about this concept of something can be statistically significant but clinically irrelevant. I'll use examples from the orthopedic literature. Christian Gerber is a preeminent shoulder surgeon in Switzerland. He published in the JBDS years ago the landmark paper on the use of latissimus dorsi transfer for irreparable rotator cuff tears. Great paper. One of the conclusions was external rotation increased from 22 degrees to 29 degrees, and there's a p-value there. We'll talk about that, what that means later, but let me tell you that this is a statistically significant difference. If you can look over here, I'm moving my hand from 22 to 29 degrees, and if you can barely see me moving, it's because I'm barely moving. In my mind, although it's statistically significant, it may be clinically irrelevant. How do we distinguish between statistically significant differences and those that are clinically relevant? We use a measure called the MCID, the Minimally Clinical Important Difference, sometimes the MID. And it's a statistic. You don't need to know what it is, but it's a threshold by which we measure clinical relevance. So whatever your field is and whatever you choose to study, you should know that each outcome measure has an MID associated with it. And we get this statistic or this value based on previous studies, and that's not relevant, but how we use it is relevant. So I do a lot of rotator cuff research. I follow about 1,000 patients who have had, prospectively. This is some data from my database. The orange line represents patients with full thickness rotator cuff tears who have had surgery. The blue line represents a population of patients who have rotator cuff tears who, for whatever reason, opted not to have surgery. And I will tell you that both populations, and we're not comparing apples to apples, but both populations demonstrate a statistically significant improvement over time. But is it clinically relevant? I told you the MID. So in this case, this is the ASES score. The MID for the ASES score is about 20 points or 20%. So how do I use that in practice? At baseline, my surgical group is about a score of 50. So you take your baseline score, add the MID, which brings us to about 70, and that's the threshold. And you can see the surgical group very rapidly crosses that threshold and stays well above it. So this is not only a statistically significant difference, but a clinically relevant one. Now contrast that. Again, this is real data to the non-surgical group. I take the baseline, mid to high 50s, add the MID for this particular score, which is about 20. What you'll see is, although they improve over time with a statistically significant change, they never approach clinical significant improvement. So this may be an item in your clinical practice that warrants further thought. Are we really doing our patients a good service with this method? Because they're just not getting better in a clinically relevant way. So think about MIDs, MCIDs. Most of your journal articles should start incorporating that. OK. A research topic should be applicable to you. Is the intervention you're going to study or the journal article you're going to read going to be implemented in clinical practice? Yes or no? So here's a nice article out of the European literature looking at the use of indwelling femoral catheters after ACL surgery. Is pain control a problem for you after your surgeries? If yes, great. Do the research project. Read the paper. If not, move along. Time's valuable. There's a concept of external validity. This is an important one. Whenever we do a research study, we have to select a study population or a study sample. The big box represents everyone in the United States that has a rotator cuff tear. I would love to study every single person, but it's not practicable. So what I do instead is I select a sample of those patients represented by the small box. I study those because it's manageable, and then hopefully the conclusions I reach for those are applicable to the greater population or to your practice, and that's called external validity. We lose external validity when we either have really small sample sizes. We'll talk about the issue of power in a bit. Or when we study kind of weird populations. I live in a small Midwest college town, and that's where my studies are, and that may not necessarily reflect your practice if you're in a big city on the coast. Keep that in mind. Here's a publication, again, out of Europe. I'm a team physician for a major college football team. I think hamstrings and knee injuries are interesting. I know absolutely nothing about Irish Gaelic footballers. So while this is a published paper, this has no relevance to me. I don't know how to apply it. So for me, this does not have great external validity. We're going to talk about control groups now. The best research is comparative research. Testing treatment A versus treatment B. And when you have two treatment groups, your study groups to the best of your ability should look alike in every single way possible and differ only in the treatment rendered. For example, if you wanted to study the latest and greatest screw, metal versus bio, 19th generation, it would be really nice to have the barber twins, for example, who are identical twins and are exactly the same in every way and, in fact, in exposure, too, as professional football players, and they would differ only in the type of screw used. Now, that's not always feasible. So the number of different ways in which we generate control groups, and this is a list from worst to best. So using literature controls, digging out some old papers, using historical controls from your own practice, it's okay, but it's chock full of biases. Some place it's an okay place to start if you don't have any other data, but as you move down that line, you know, the gold standard for a control group is a randomized control group. Why? We'll talk about this in a bit, but let's just say you're doing an ACL reconstruction study, looking at hamstrings and patellar tendon. We think we know a lot of the factors that influence the outcomes, gender, activity level, BMI, AIDS, whatever it may be, but there's probably also many, many, many other factors that influence the outcome that either we're not thinking about or we simply just don't know. So by randomly assigning our patients to two different treatments, we essentially distribute the known factors as well as all of the unknown factors equally so we can really compare the treatment rendered. So a random assignment to control groups is the best way to do it. There are a lot of outcome measures in the orthopedic literature. I'm not going to list any of them. Whatever your field of interest is, know the outcome measures, make sure they're valid, and get a sense what their MID or MCID is so you know how to interpret them. One of the biggest problems in orthopedic research is our studies are too small, and all of our research projects start with what's called a sample size estimate. You need to have a big enough study in order to be able to make conclusions, either a valid statistical assessment or simply look at a big enough population so you can discern differences. How do we do this? We do it at the beginning of a research project. And your study size can be too small, it can be too big, but you'd like it to be just right. If the sample size is too small, you may simply not have enough data to find a difference, and the difference may be there, unfortunately. Alternatively, if you study too many subjects, you may be wasting a lot of resources, time, money, effort. But remember, if you look at a lot of variables, we accept a 5% error rate, which you'll learn about in a minute. That means if you look at 20 different variables, you may find that one proves to be statistically significant when it's completely irrelevant, and it just happens to be a statistical error. So don't look at too many things either, so you want to get it just right. And the way you do that is you do this calculation before a research project called a sample size estimate, and I'll talk to you through the highlights of that. But here's an example where things can go wrong. Gary Gartzman, a preeminent American shoulder surgeon, did a randomized trial looking at total shoulder arthroplasty versus hemiarthroplasty for arthritis. And I think those of us that do shoulder arthroplasty know that total shoulder arthroplasty, putting in a glenoid component, renders better results. And this study's been reproduced now a number of times. Well, Dr. Gartzman and his crew did their homework. They did a pre-study sample size estimate, knowing that they needed to have 35 subjects per group. Well, what happens is when they reported their paper, they actually only had 51 subjects instead of the 35 subjects that probably they needed for statistical analysis. And guess what? Most of the outcomes were inconclusive, despite the fact that we know there's now a difference between these two treatment options. So stick to your guns, do a sample size estimate, and make sure you achieve that. So I mentioned the concept of power. This is often a test question. When you do a statistical analysis, power is the probability of identifying a difference in two populations when one truly exists. So what we're trying to do with power is minimize false negatives. By convention in our literature, we accept a power of 80%. That means 20% of the time there will be a difference between two populations that we just don't discern because we're not looking hard enough. So we accept a 20% false negative rate. How do we improve our power? This is the mantra. They love to test this one. There is power in numbers. By simply increasing your sample size, you will increase your power. You will increase your ability to find that difference if it really exists. Errors are also tested. And without getting too sophisticated about this, there's two types of errors in statistical analysis. There's a type 1 error, which is a false positive. You say there is a difference between two things when there really isn't. That can be dangerous. We only accept a 5% false positive rate. And there's a type 2 error, a false negative rate. False negative error, type 2 error. We accept a 20% rate there. You say you conclude that there is no difference when there truly is. That's a type 1 false positive, type 2 false negative. So how do we actually do this sample size estimate? There are very few factors that weigh into this. There's something called the effect size, the variance, and then the two types of errors I just discussed. And I'll take you through an example of how we do this. So I do rotator cuff research. And let's just say I want to do yet another randomized trial looking at arthroscopic versus open rotator cuff tears. So I'll use an outcome measure that's well validated, this WARC, Western Ontario Rotator Cuff Index, because I know that I have a sense of what the MCID will be. I have a sense of how this falls out in the normal population. So I have a good sense of what the means and variations would be. I don't worry about that too much. So these are the three factors that we need to consider, effect size, variance, and error in power. So how does effect size weigh in on this? The greater the effect size or the greater the difference you would like to detect between the two treatment groups, the smaller the sample size you need. So here's an example. All else being equal, if you think there's a 50% difference in your treatment comparing between the two, then you'll only need 14 subjects for this particular study. And this is real data. However, if the difference is going to be quite small, if you think it's going to have a 10% difference, then you need 319 people per group. So again, the bigger the change, the fewer subjects you need. How about variance, variability? The greater the variance or the greater the variability you have in your population, the larger the sample size you need. Again, an example, real data, controlling for everything else. If you have a small, tight standard deviation, you need very few patients, four. If your patients are all over the place, 700 standard deviation with a mean of 1,000, that's a very variable population, you'll need 125 subjects per group. So these are real numbers, and I hope you get the impression of the trends here. The more variance, the more subjects. And then lastly, this power business, I already told you. The way to increase your power, the way to increase your ability to look for a difference if there is one, is by increasing your sample size. Power in numbers. And you can do this exercise, and you can say, I need 104 subjects per group. And then you have to decide if that's feasible, do you have enough money, is there attrition? But that's not relevant for testing. But the other concepts are, there's some free software online if you want to play with these numbers. You can plug these numbers in and see how these things play off each other. We're going to move ahead and look at study design. We categorize these in several ways. A study is either observational or experimental. Observational is you're looking at a population, your patients, and you're describing them. An experimental study is when you actually do an active intervention. I have a patient, and I'm actually going to intervene and study them over time. And then a study is either retrospective or prospective. A retrospective study is we start the study today, and we look back in time at things that have happened in the past, maybe our patient population from when you started your practice. A prospective study is a study that we start today, and we track our data moving forward. Prospective tends to be a little bit better for a number of reasons. I'm going to go through three, actually, I'll go through two of these in great detail. These are very common study designs. Unfortunately, case series litter our literature, but I'll talk about two more powerful study designs that I think you should know about. One is called a case control study. And let's start over here on the right side of the screen. And I apologize that the colors didn't print well in your book, but have a look at the screen here. A case control study is a really nice way of looking at a rare disease. So let's go back in time, I don't know, let's say the early 1900s. People weren't aware that actually smoking caused lung cancer. And you're a general doctor and all of a sudden you're seeing a rash of lung cancer patients in your practice. You have absolutely no idea what's causing it, a rare disease. So what you do is a case control study. You take your group of lung cancer patients, these blue guys, and then you find a control group. These are your cases. Your control group are these red guys, and they don't have lung cancer. And they differ only in the fact they don't have lung cancer. You try to match these in every way. Same age, same gender, same occupation. The only difference between these two populations is the blue guys have lung cancer, the red guys don't. And then you ask a series of questions. What you're trying to do is tease out what exposures these cancer patients may have had that led to their cancer. We're not looking for causation, just association. So when you take the history, you find out that three of the five lung cancer patients were smokers and zero of the five non-cancer patients were smokers. And then you can do a statistical analysis, and you can hypothesize that there's clearly an association between the exposure of smoking and the development of lung cancer. Another way to do this is prospectively, not looking back in time, but moving forward with a cohort study. Let's say you don't know if patellar tendon is better than hamstring and resulting in your outcome being arthritis down the road. So what you can do in this case is your red guys are your patellar tendon reconstructions. The group on the bottom is your hamstring reconstructions. You follow them over time. You find that three out of five in your patellar tendon group are getting arthritis of the knee, and only one out of five in the hamstring are getting arthritis. So you say, you know, maybe there's an association here between patellar tendon graph and hamstring graph as an exposure risk for development of arthritis. So that's a cohort study moving forward. Those are types of observational studies. An experimental study would be an intervention, a clinical trial. We do something. It can be in the laboratory. It can be in surgery. And then we observe the outcomes. And I told you that the gold standard would be a randomized clinical trial. You take a group of patients with ACL tears, and then you randomly assign them to either hamstring or patellar tendon reconstructions. Why? Because we truly don't know what factors lead to or influence outcomes. So we distribute all those factors equally in these two populations by randomizing them, and then we follow them over time. And that's the purest way to discern a difference in an experimental study. We're going to move forward now to the type of study that evaluate the accuracy of diagnostic tests. And I'll tell you that in orthopedic surgery and sports medicine, these often take the shape of a physical exam finding, you know, the O'Brien's test or a Lachman test or an MRI finding. You know, what does a gone fracture mean? Or if you see a finding and I see a finding, what does that mean? And we'll talk about how we sort those out. So this falls under the category of screening tests, whether it's a radiologic study or physical exam finding. And if you think, you know, not orthopedics, but just disease in general, if you look at this red line, it's the natural history of a disease. There's initiation of a disease early on. It could be a mutation. It could be an injury. It could be whatever it is. They'll be in time very early in the disease where it could be detectable by screening before a patient presents with clinical symptoms and then maybe complications and then death. So the goal here would be to identify these patients before they have the onset of symptoms where you could hopefully have an intervention. And that's the goal for screening. But in our world, it's not that heavy. And what we're trying to do is distinguish patients who have disease from patients who don't have disease. And again, I said it's usually imaging or physical exam findings. In order to do this, you have to understand how disease is distributed in human populations. Things are either oversimplification, but either bimodal or unimodal. So this is an example of a bimodal disease. So I'm sure we all have to get PPD or in this case a BCG tested to work in a medical facility. They put an antigen under your skin. If your body's never been exposed to TB, then you have a very small induration, a few millimeters. Contrast that to people who have actually been exposed to the disease before. They have a very large induration. And you can see there's a very clear distinction between this population and this population. So in a bimodal distribution, it's pretty easy to discern. Here's an example from sports medicine. Way back, Dale Daniel described his early arthrometer, one of the early versions of the KT-1000. And what you can see here is this is a normal population, side to side difference in normal knees. The spike distributes right around zero. Knees should be symmetric if healthy. ACL deficient knees are distributed a little bit further down the right. And statistically, they found that if a patient has three millimeters of side to side difference or greater, that means statistically, with great confidence, they had an ACL tear. So if you superimpose these two graphs, you can see that's a bimodal distribution. That's easy to do statistically. Unfortunately, some things are unimodal. This could be systolic blood pressure of everyone in this room. And it's a bell-shaped curve, and that gets a little bit trickier. So how do you define disease here? We have to have some biologic relevance, and that can be a little more challenging to do. So we're looking at screening tests. There's three ways we evaluate these. And I'll tell you when I think you need to start looking down and writing these things, but get the concepts here. How good is your test? How does it perform? And we test that by validity, and the tests are sensitivity and specificity. You'll need to know those. Should I use your test? What's the positive predictive value? If someone tests positive, what does that mean to me? Pretty important. And then lastly, if I describe this test, if I do the same test next week on a patient, is it relevant? Or more importantly, if I describe a test, can you reproduce it in your practice with accuracy? So those are the three types of studies we do. How good is your test? How valid is your test? And when we talk about sensitivity and specificity is, is your test have the ability to distinguish those with disease from those without disease? So I want you to get used to this concept of this called a two-by-two table, because you may need to construct one or they may give it to you. So we're always comparing a known disease, and that's the gold standard, it's maybe a biopsy or something, to your test. It could be an x-ray finding, an arthroscopic finding. So when we build these tables, the columns are, the truths are, this patient, these patients have disease, these patients don't have disease, and what are the test results? These patients have tested positive in the rows, and the bottom, these patients have tested negative. So for example, in this cell, you have patients who have the disease, right, they're in this column, but they've also tested positive. So these are true positives, as opposed to the ones below them, they do not, they do have the disease, meaning they're truly diseased, but they test negative. So this would be a false negative test, and I'll, and I'll give an example of that and how you actually calculate it. So remember, sensitivity is your ability to detect true positives. Again, the probability the person having the disease is detected by your test, and specificity is the ability to detect true negatives. The probability of the person who tests negative truly does not have the disease. And this is, again, another view of that two-by-two table. So we have whatever the gold standard is, are the columns. These patients have disease, these patients don't have disease, and these, and the rows are your patient tests positive, or your patient doesn't test positive. So look at this. Sensitivity, again, is your ability to discern people who have disease. So of all the people that truly have the disease, that's this column, A plus C, how many test positive? Those are the A's. So how many of the people with disease test positive, that's the A over A plus C, will give you your sensitivity. Specificity is, of all the people who do not have disease, that's the negatives, the column here, B plus D, how many of them test negative? Again, that's the row here. So D over B plus D is your specificity. And here is an example that could be a real test question. We want to look at a diagnostic sensitivity and specificity of picking up ACL tears on MRIs. And we can say the gold standard is arthroscopy. You look at a knee, and you should be able to tell that patient has an ACL tear or not. And we want to see, is MRI good at doing this? So this is real data from a published article, and it doesn't really matter. But this could be a test question. They could say, what's the sensitivity of an MRI in picking up and discerning ACL tears? So this is real data. We know that in this particular population, 421 patients actually have ACL tears at arthroscopy. Who tests positive by MRI? These people. So the sensitivity would be, of all the patients who have ACL tears, 421, how many tested positive? 394. So the sensitivity is just 394 over 421, 94%. So the sensitivity of an MRI in discerning an ACL tear is very high, 94%. And that's how you use this. Then I mentioned how effective, or what's the efficacy of your test? Should I use it in my practice or not? So a positive predictive value means, what is the probability that someone who tests positive truly has a disease? And the corollary will be for the negative predictive value. And this is how you calculate this. So the positive predictive value. Everyone who tests positive, now let's go to the rows, right, this is the test results. Everyone who tests positive is A plus B. That's a number. A plus B, test positive. How many actually have the disease? That's this column, but it's actually this cell. So of all the people who test positive, A plus B, A have the disease. So A over A plus B is your positive predictive value. And if we go back to the ACL example, of all the patients who have a positive finding on MRI, 426, how many actually have the disease? 394. So the positive predictive value is 394 over 426, or 92%. So you can say you're 92%, you know, confident, and I use that term loosely, you're 92% confident that an MRI with a positive finding represents true disease. So that's how you use a 2x2 table. You know, just have a look at it before the test. You should be able to, they either give it to you or they'll give you very, very simple data and you can do your sensitivity, specificity, your positive predictive value. And the last category is reliability. How likely are you able to reproduce your results? How consistent are your results when repeated? And this either can be intra-subject, so you test a patient one day and you test them the next day. How much variation is there within a patient? But also more importantly, how much variation is there between me and you when we apply our tests? Inter-observer variation. And again, physical exam findings, MRI findings, and this is described as agreement. So this is the kappa statistic, and you don't need to know this necessarily, but you'll know how to perhaps interpret it. But agreement is measured by how much agreement we actually observe by a function of how much agreement just happens by chance alone. So you'll never be expected to do this calculation, but you should know that in the scientific literature, agreement of .75 or greater is considered to be excellent. And it's one of those rare times where a score of 75 is considered excellent. But that means there's 75% agreement beyond the chance of agreement just by chance alone. So this is how you interpret agreement studies. We're going to shift gears and talk a little bit about epidemiology now. And epidemiology is the study of populations or how illness or health is distributed in populations. One of these two terms will appear on your test. Prevalence and incidence. Think of prevalence as a snapshot in time, and think of incidence of having a rate, a time-dependent rate. So prevalence is the proportion of individuals within a population that have a disease at that point in time. So you can take a snapshot of the American population in 2001, and you can say 21% the prevalence of obesity, or 21% of Americans at the time were obese. Snapshot in time. That's the burden of disease at that point in time. However, incidence includes a time or a rate. And that's the proportion of new cases of, let's say, ACL tears in a specific time interval. And it always includes time. So here's an example of that. Let's say you're interested in determining what's the incidence of ACL tears in skiers. How would you determine that? You sit at the bottom of a ski hill. Everyone goes by. Someone falls. Tears are ACL. Now you have some data. Let's say you picked up 56 ACL tears in a six-month ski season in 500 skiers. Pretty simple data. So that means the incidence of ACL tears in this population is you had 56 injuries over 500 people over half a year. So you can see there's a rate. And you'll be tested on one or the other. How do we measure the importance of that? You'll hear something called the relative risk. And the relative risk is an expression of the magnitude of the association between the exposure at hand and the outcome of interest. So for example, how big of a risk is smoking in the development of lung cancer? So the relative risk of smoking is you calculate the incidence of cancer in the exposed. So the incidence of cancer in smokers over the incidence of cancer in non-smokers. Incidence in exposed, incidence in unexposed. Incidence of ACL tears in skiers, incidence of ACL tears in couch potatoes. And that defines a relative risk. You don't know how to calculate that, but you have to interpret that. But think of this as a simple fraction. So if your relative risk is 1, that means it's going to be a number over the same number. That means that there's probably no association between the exposure, smoking, and cancer. If your relative risk, your ratio, is greater than 1, that means the numerator is greater than denominator. That means smoking is more highly associated with cancer than non-smoking. So that means there's a positive association between the exposure and the risk. And the relative risk of less than 1, the denominator is greater than the numerator, would be something protective. And you can put your own example there, but vitamin D and the risk of fracture could be a protective thing. Their relative risk would be lower. So just think about it as a numerator, denominator, and how that changes with different numbers. Odds ratio is, for all practical purposes for you, exactly the same thing. It's a ratio of risks, but it's interpreted exactly the same way. An odds ratio of 1 means there's no association between the exposure and the disease. An odds ratio greater than 1 is there's a greater risk. I mentioned that when we have an incidence, there's this factor of time. And I also said that when we look at case control series, we're looking backwards in time. So when we use case control studies for that reason, we use odds ratios. But it doesn't matter. Just knows what the ratio of 1, greater than 1, or less than 1 means. Now let's talk about evidence in orthopedic surgery and in clinical medicine. This is the Budworth's evidence-based medicine. What does it mean? It means that we're all obligated to take the current best evidence and apply it to our medical decision making. And then we have levels of evidence. So this is a JBJS level of evidence. We know that the highest level of evidence, for reasons we discussed, will come out of a good quality randomized controlled trial or a systematic review or pooling of those trials. And as you go down this tier, down to a level 5, that means what's your opinion? Expert opinion is the lowest level of evidence. So sometimes we don't have access to high level of evidence. And we make medical decisions based on our experiences or those experience of experts. And sometimes we have really good quality data, like level 1 or level 2 studies. So what would be an example of these types of things? So let's just say for whatever reason you need to do the millionth study looking at hamstring versus patellar tendon autographs. So a level 1 study would be a randomized trial. So we start the study today. And we randomly distribute our patients between hamstring and patellar tendons. Remember, that's good evidence, because the million things we know about and don't know about get distributed equally. And then we can truly discern the difference between these two graphs or the effect of the graphs. Well, that's not always possible. A level 2 study would be a prospective comparative study. Let's say you want to do this study, but you can't randomize your patients, because it's hard to do in a surgical population. So you basically start collecting data today on your hamstring and your patellar tendon patients. And you follow them over time. That's a prospective cohort or comparative study. That's pretty good evidence. You could do that retrospectively as well. You can look back at your practice and see how your hamstrings performed over time versus your patellar tendons. That's full of a few biases. And it's not relevant why, but you can imagine doing things in a meaningful way systematically moving forward is a little bit better. And then lastly, it would be a case series. A case series is not comparative. You can report on your hamstring patients. You can report on your patellar tendon patients. But it's not meant to be comparative in any way. And then an expert opinion is simply, we don't have access to better data. So Chris Kading is an expert in ACL surgery. We're going to rely on his expert opinion to guide our treatment. All right, this is the last part of this talk, talking about statistical analysis. I'm going to make this very clear and simple. And I'll show you where test questions come from this. So there's two types of, we'll call them stats, statistical approaches to data. One is simply descriptive statistics, where you describe the data. Mean, median, mode, standard deviation. You're not applying any tests. You're just describing it. And I'll have a slide next to discern what that means. The other category is inferential statistics. And that's where you apply a statistical test. And what we're trying to do is determine if there is a difference between the two populations. And we're always looking for differences. And these tests help us decide if the difference between these two populations are likely to be real, or if they're more likely to just be a reflection of chance alone. And those are the two categories of statistics. So let's talk about descriptive statistics. Mean, median, mode. So let's take, if we took everyone in this room, and we ask you to line up along that wall in age order, youngest to oldest, OK? The average age would be, I think you know the average, everyone's age, added up, divided by the number of people in line. So that's your mean. The median would be that person right in the middle of the line. So we line everyone up, and 49% or 50% of the people on either side, that's the median age. And then lastly, the mode is the most common value. So let's just say we had 17 people here that happened to be 37. We could have a population that looks all over the place, but the mode, the 37-year-olds, whatever, could skew our data. So a normal distributed population would look like this bell-shaped curve, where the median, the mode, and the mean kind of all are superimposed. That's not very common. To get that, you've got to look at very, very, very, very big populations. But more likely in human populations, especially the things we look at, we have some skews. So for example, if we had a lot of 37-year-olds, that would skew our data one way relative to the mean. So don't beat yourself up. Just know that when they test that, it's usually the mode. The mode is the most commonly repeated value. Now let's talk about standard deviations. A standard deviation is the numerical way we have of describing variability or variation in our population. So how well is our data spread out? So if you go back to this slide, this is a bell-shaped curve, and it's pretty tight. So if you look at board scores in part one of medical school, it's a very, very tight distribution, and it may look like this, bell-shaped curve. This is also a bell-shaped curve, but it's a little more broadly distributed. So we use a standard deviation to kind of distinguish how this data is distributed. The most important thing you need to know is 95% of your data is going to be distributed within two standard deviations of the mean. What that means is if you have, here's zero, there's your mean. If you go to this way or to this way and color that area under the curve, that represents 95% of your data. And so really, in statistical analysis, we're trying to find out or exclude those patients that are outliers, that 5% on either side. So just remember that two standard deviations is 95% of your population. And we use that when we determine what's called confidence intervals. So I think you'll read a paper. Very often, you'll see an author will describe a mean plus confidence intervals. And what does that mean? It means they take the mean and add two standard deviations either way. So let's look at this bell-shaped curve here. Let's say we took everyone in this room and asked you to shake out your wallet, count your cash, and we did a statistical analysis. And we found that on average, so the mean amount of cash in your wallet was $40 for this population. And the confidence interval is $37.33 to $42.67. There's a mean and a 95% confidence interval. What does that mean? How do we apply it? Let's just say next year at this time, we have a pretty similar population, or we take a population, a subset of the people in this room, and measure your money. We can say that based on this data, I'm 95% confident that your wallet contains somewhere between $37 and $42. So it's our way of describing the distribution of a population. And again, we accept a 5% error rate, that false positive rate. We accept a 5% chance that we're not capturing everything. That's why I'm saying I'm 95% confident that we'll fall within two standard deviations. I'm going to finish up with two concepts. So a favorite test question is, what statistical analysis would you use for this particular data set? And I'm going to grossly simplify this for you, and hopefully oversimplify it. OK, so all you need to know is what type of data you have and what is the nature of your data. And I have some algorithms. And what I mean by that, your data is either continuous or categorical or discrete. A continuous variable is something that can be measured, 0 to infinity, height, weight, strength, age. It's number. A categorical or discrete variable is just that. You can be outer bridge type 1, 2, 3, or 4. Those are discrete variables. You can have arthritis, yes or no. You can be male or female, yes or no. So there are no continuous variables in there. So you need to know what kind of data you have. And that's pretty straightforward. And then you need to know if your data is normally distributed or not. And don't worry about that one, because they will tell you in the test question. They'll either say you have a big population that's normally distributed, or it's not. And then the question will be, which test do you apply? And this is how you answer that question. There'll be one or two of these. So I'll talk you through this. And I'll go through two or three examples. So is your data continuous? If yes, you go down here. If it's discrete, you go down here. And you stop and you do a chi-square test. That one's easy. If your data is continuous, height, weight, age, then they'll tell you, is it normally distributed? Yes or no? They'll usually say yes. So you go down here, normally distributed data. And then the next question is, how many experimental groups do you have? You either have one, two, or more than two. And that'll determine it. It's as easy as that. So let's see how this is applied. Here's a test question. What statistical analysis would be used to test the hypothesis of equal? And you can put a continuous measure here, of equal weight in two groups tested in the laboratory? The data are distributed normally. So they tell you that. And they list all these tests. Don't worry about the name of the test. Just remember the two or three at the end of this algorithm. So I've highlighted this path. So we know it's a continuous variable. So we're going this way. They tell you it's normally distributed. And it almost always isn't a test. So you go down here. And I'm only testing two groups. So what's our test going to be? A, t-test. Simple. You don't even know what it means or how to do it. Just follow that algorithm. Here's another question. You are testing the stiffness of four reconstructions in a lab. Again, you're doing a lot of cadaver. So the data is normally distributed, as it always will be in these tests. So what type of data do we have? Stiffness, that's a continuous variable. And we have four different groups that we're testing. So what do we do? We have a continuous variable. Yes, it's normally distributed, because they tell us it is. And how many groups do we have? We don't have one. We don't have two. We have more than two, because they said four. So you do an ANOVA, or analysis of variance. Again, that's all you need to know for this test. Last example, this is the easy arm. Two groups of patients have been treated, and each patient's outcome has been graded as poor, good, or excellent. Is there an association between treatment and outcome? So what kind of data do we have? Poor, good, or excellent is categorical data. It's not continuous variable. So go down this arm, and you always do a chi-square test there. And so if you remember the t-test, the ANOVA, the t-test is comparing one group or two groups. An ANOVA is greater than two groups, and a chi-square is categorical or discrete variables data. Then you're going to get 99% of these. And every now and then, they'll show you these funky names of things you've never heard before. Just don't pick that one, because they rarely test it. So what we learned there are the parametric tests, and those are tests, the t-test, ANOVA, those are designed for normally distributed data, big populations. If for some reason you have a very small sample size, the data is not normally distributed, then for each of these parametric tests, there's a non-parametric test with a funky name. I just think in the interest of guessing, just don't guess one of those. It'll be one of those. It'll be a t-test, ANOVA, or chi-square. And I'll end up on how to interpret a p-value. I think we read a paper, and we're reading the abstract, and we're just focusing on p-value, p-value, p-value. What does it mean? So by convention in our literature, and in most scientific literature, a p-value of less than 0.05 is considered statistically significant. What does that mean? That means if a p-value is less than 0.05, it means there's less than a 5% chance than the observed difference between two groups was due to chance alone. So if you're 95% confident that it wasn't due to chance alone, then the difference was clearly attributed to your treatment or intervention. So that's what a p-value means. You're 95% confident that the difference that you see is due to your intervention and not chance alone. And that's all it means. And I would not expect anyone to read any of these, but if for some reason we piqued your interest, in the late 90s, the AGSM via the AOSSM published a number of statistical primers. They were authored by Jed Kuhn and Ed Voytis, and they're basically introductory papers on statistical methods for orthopedic surgeons. And they're basic, and they're really easy to read if you're interested. So that's a pretty good resource. And for that, thank you very much for your interest. Thank you.
Video Summary
The video transcript consists of a presentation on sports medicine research and statistical analysis. The speaker discusses the importance of reading journal articles and being a sophisticated consumer of research. They cover topics such as the expected size of effect, clinical relevance, statistically significant differences, minimally clinical important difference (MCID), control groups, outcome measures, sample size estimation, power, errors in statistical analysis, diagnostic test accuracy, sensitivity and specificity, positive predictive value, agreement studies, levels of evidence, and statistical analysis for continuous and categorical data.<br /><br />The speaker emphasizes the need to understand these concepts in order to interpret and apply research findings in clinical practice. They provide examples and algorithms to guide the choice of statistical tests based on the type and nature of the data. The presentation also touches on the interpretation of p-values as a measure of statistical significance.<br /><br />No credits were mentioned in the video transcript.
Asset Caption
Bruce S. Miller, MD, MS (University of Michigan)
Meta Tag
Author
Bruce S. Miller, MD, MS (University of Michigan)
Date
August 11, 2018
Session
Title
Sports Medicine Research
Keywords
sports medicine research
statistical analysis
journal articles
clinical relevance
sample size estimation
diagnostic test accuracy
levels of evidence
p-values interpretation
statistical tests
×
Please select your language
1
English