false
Catalog
2019 Orthopaedic Sports Medicine Review Course Onl ...
Sports Medicine Research
Sports Medicine Research
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
We'll get started, so I'm Bruce Miller, I'm a sports medicine surgeon at the University of Michigan. I take care of our football team and we're in camp this weekend and I've got to get back like the rest of you. Those who stay for the last talk on research will be greatly rewarded. Every year there are about two to five questions in the exam from this material. It's not terribly high yield, if you listen you'll get these questions right, but I think more importantly, whether you're doing research or not, we're all consumers of medical literature, medical communications, and I'd like to talk you through my approach to evaluating the literature in a scientific way, and I think it'll help you for your test prep dramatically for those two to five questions, but I think it'll also make you a much more discerning reader of literature. I have no disclosures, this is the outline of topics we're going to cover. For many of these topics I'm going to use real examples from the literature to drive home some points. When I'm either planning a clinical trial or reading an article on a clinical trial, I go through a number of checklists in my mind, and I'll take you through those. First of all, there's this concept of the expected size of effect. So we all want our work to be impactful, we want our work to be the shot that's heard around the world, and so the question is, whatever the intervention you're doing is, is it strong enough to produce a detectable effect, and if it truly is detectable, is it statistically significant, maybe, maybe not, but more importantly, is it clinically relevant, and we'll talk how we determine that. So here's an example from the literature. Christian Gerber wrote this landmark article on latissimus dorsi transfers for irreparable rotator cuff tears. The take-home message was external rotation increased from 22 degrees to 29 degrees, that's seven degrees, and it's statistically significant, they tell you the P is less than 0.05, and we'll learn what that means in a little bit, but if you look at me up here, I'm going from 22 to 29 degrees, and if you can't see my arm moving, it's because it's not moving very much. So in my mind, this is an operation that's a really, really long run for a really short slide, so I have elected not to adopt that in my practice, so I think this is a good example of something that's maybe statistically significant, but perhaps clinically irrelevant. How do we figure out what's clinically relevant in the literature? There's something called the minimal clinically important difference, MCID or MID, and in short, it's a threshold over which your intervention is clinically relevant. Now for any outcome measure that you may have, there's basically an MID or an MCID, and it's just a number, so basically if you start with a baseline value for a patient and add your MID for whatever that outcome measure is, that's how you determine if something's clinically relevant, and I'll talk you through an example. So I study rotator cuff disease. I have a very, very large database. This slide represents two populations of patients who have had rotator cuff tears. This is the ASDS score over time. The orange line represents the patients who have had surgery. I'm sorry, your books are not color-coded, but the blue flatter line represents patients who were treated non-surgically, and I can tell you with certainty that both populations have improved in a statistically significant way from start to finish, but is it clinically relevant, and this is how it comes into play. So I told you for the ASDS score, the MCID is about 20 points, so for our surgical group, they start at baseline here around 50, so I add 20 points to that, and that defines this threshold line, and you can see that our orange line, the ASDS score, crosses that line and stays above that line, so not only is this statistically significant improvement, it's a clinically meaningful improvement. Now let's contrast that to the non-surgical group. I told you that they were statistically significant improvements, but you go to baseline here with just under 60, I add my MID of about 20, that defines our threshold, and although this blue line is improving, they never reach, they actually don't even get really close, they don't cross that threshold, so this is an example of statistically significant improvement that may not be clinically relevant, much like the tendon transfer example I gave you. So that's one issue. The next one is applicability. Is your intervention going to be implemented in clinical practice, or is the paper you're reading going to be meaningful enough for you to adopt to your practice? So I do a lot of ACL surgery, I find this article, Indwelling Femoral Nerve Catheters for ACL Surgery, so if pain control is a big enough problem in your life or in your practice that you'd consider adopting this practice, then do the research practice, do the research study, or read the paper, if not, forget about it and move on and don't waste your time and energy and that of everyone else. There's a concept also of external validity, and that is, does the study population reflect the greater population at large? So again, I study rotator cuff tears, I'd love to study every single person in North America with a rotator cuff tear, they're represented by the big box of people, I can't, that's not practical, so I take a small sample size of study population, the smaller box, do a study, and then draw a conclusion that hopefully reflects the greater population at large, and that's what considered, you know, if it's a good study, it has external validity. Well, where does that fall short? In my case, I live in a small Midwest college town, it's kind of a funky population, and so my findings in Ann Arbor, Michigan may not be particularly applicable to your practice in, you know, San Francisco, California, for example. Here's an example from the literature, I'm a team physician, I'm interested in hamstring injuries and knee strength, but I know absolutely nothing about Gaelic football, zero, so this particular study is far enough to me in terms of external validity that I don't know that I can apply it to my practice. Okay, so whenever we're studying an intervention, treatment A versus treatment B, we should have a treatment group and a control group, and the key here is that both groups ideally should look alike in every way possible and only differ in the intervention, and that's the best way to test an intervention, so these are the barber twins who happen to be in the So if you wanted to study the latest and greatest interference group versus a gold standard medal, you know, an identical twin study would be the way to go, that's a great control group, may be unethical, but that's my point, you should be controlling your studies. And this is a list of control groups from the worst to the best, and I'm not condemning the use of literature or historical controls, because sometimes that's all you have available to you, but you can see marching down, the best source of a control is random assignment, and I'll explain why that is later, but basically it controls for every single possible variable that we know affects outcome, but it also distributes all those other things that we may not be thinking about equally between the two treatment groups. And then you have to know your outcome measures in the literature, I don't know all of them, I know the ones that I study, so, you know, if you're reading a paper, make sure it's a validated outcome, get a sense of what the MCID is, so you'll know how the outcomes change in population for that particular measure of choice. Whenever you start a study, we do what's called a sample size estimate, and that basically tells you how big the study needs to be in order to have a meaningful conclusion. So it's kind of like Goldilocks, when you plan, it can either be too small, too big, or just right. If your sample size is too small, and I'll argue that most orthopedic studies are, you're not studying enough people, and if you're not studying enough people, you're probably not going to find a difference, even if one exists, and that's the concept of power, we'll talk about it next slide. If you study too many people, you're going to find some associations that are spurious, that just don't matter. So remember, in the literature, we accept about a 5% false positive rate, so if you study 20 variables, one of them is going to be positive, just by virtue of being a false positive alone. So don't study too many things, just study just the right amount, and you should determine what that sample size is before you launch your study. Here's an example from the literature, JBJS. Gary Gartzman is a preeminent shoulder surgeon in the U.S. I think now we know that a total shoulder arthroplasty, as Tom Gill taught us, a total shoulder arthroplasty outperforms a hemi-arthroplasty for arthritis, but the time of this study, this was going to be the gold standard study, a beautiful study, he took patients to surgery, he opened up the shoulder, and only then did he open an envelope to determine if they're going to get a hemi-arthroplasty or a total shoulder arthroplasty. He did his homework, and he decided he needed 35 subjects per group, based on his sample size estimate. However, if you read the paper, when the dust settles, he only studied 51 patients. So guess what? Most of the comparisons in this trial were inconclusive. Not because he didn't do a good study, but because he didn't study enough people. So stick to your gums, do your homework, and then study everyone you need to do. So that's the concept of power. And this particular concept does appear in the test, and I'll identify the things that I think show up routinely on the test. So the power is the probability of identifying a difference when one truly exists. So in statistics, what we're trying to do is take a population or two populations and determine if there's a difference or an association between those two groups of people or things. So by convention, we use a power of 80%. That means 20% of the time, there's going to be a false negative. There's this mantra that there's, just remember, power, there's power in numbers. So the more subjects you include, the more numbers that you have in your study, the more likely you are to find a difference if one truly exists. So we accept a 20% false negative rate in the orthopedic literature. Type 1 and type 2 errors. When we do a statistical analysis, we accept a 5% false positive rate. That's a type 1 error, it's a false positive. And we accept a 20% false negative rate. There is a difference, but we don't find it. So that's a type 2 error. And you can look at this 2 by 2 grid, but I'll show you how to actually interpret those in a minute. So when we're starting the study, we want to make sure we have just the right amount of subjects in it. And there's only three or four variables that go into an equation. And I'll talk you through an example here. I studied rotator cuff disease. I want to do yet another study looking at arthroscopic versus open rotator cuff repair. I know my outcome measure, the WARC score, it's validated. I know how it changes in this population. So based on some pilot data I have, I know that the mean score after an intervention should be about 1,000. And the standard deviation, so how much variability there is in this population, should be about 450. Again, just an example here. So when we're planning our study, I need to know the effect size. Let's see, how much of a difference do I think I want to find or I think I'll find between these groups? What are the variants? How much heterogeneity is there in the population baseline? And then error and power, but I told you that's pretty much standardized in our literature. So how does treatment effect size affect our sample size? How many subjects do I need in each group? So the greater the effect size you'd like to detect between two groups, the smaller the sample size you need. So everything else being equal, if I think I'm going to see a 50% difference in my WARC score between treatments, I only need 14 subjects per group in this study. Very manageable. If I'm looking for very small differences, in this case 10%, you can see I'll need 319 or 640 subjects overall. Very, very big difference. These numbers become important when you're planning a study. How much variance or how does variation in your population affect your study size? Basically the greater the variance, the larger the sample size you need. And again, a little example here. If you have a very tight population, a sample size small of 100, you need very few subjects. If you have a population that's all over the place and variability, 700 is my standard deviation here, then I need a lot more subjects. And then power. I told you there's some standards of power in our literature, 80%. So how badly, this is your lifetime work, how badly do you really want to detect a difference if one really exists? So the greater the power, remember power in numbers, the greater the power, the larger the sample size you need. So I'll take the standard, and this is 80% power, so I'm willing to accept the 20% false negative rate. I'll take 52 subjects per group. But if this is my life's work, and I don't want to miss a finding here, then I'll just have to increase my power. And as you can see, instead of taking 52 in a group, I'll take 70 in a group. And that's it. There's a simple calculation before you do your studies. I'll just tell you most orthopedic studies don't show a difference. Why? They're usually underpowered. And if you want to play around with these numbers, there's some free software online if you're inclined. All right, we're going to talk about study design. There's two types of studies, and this may be a question. There's an observational study, where you're basically observing a population and describing what you see. Or there's an experimental study, where you think about it as a lab study, for example, where you're actually doing an intervention and seeing what the result is before and after. And then these studies, these observational studies, are either retrospective or prospective. I think this is simple. If you start your study today and looking at data you collected in the past or events in the past, that's retrospective. Seems silly, but ask that question. And if you start your study today and you're collecting data moving forward, that's a prospective study. So there's not very many types of observational studies. Our literature is littered with case series, small, underpowered, descriptive things, which is okay. But there's two study designs they'll ask you about, case control or a cohort study, and I'll spend a slide on each telling you about those. So a case control study is good at looking at a rare disease. So let's just assume we go back in time. We're in the 1950s. And we didn't know there was a really strong association between smoking and cancer. So if you look up at the screen, there's these, you know, you're in your practice and you're seeing, God, these, I'm starting to see more and more cancer patients. I wonder why that is. So you capture all your cancer patients, those are your cases, and you want to compare them to a control group. So what you want to do is find a control group that looks like your cases in every way. The only thing that differs is they don't have cancer. So age, gender, ethnicity, employment, you know, geographic, location where they live, whatever it is. Think of as many things you can. And then what you do is you look back retrospectively at their lives, you ask questions, and you're trying to figure out what their exposures were. And then you take their history, you find out that, guess what? Looking back in time, three of these five cancer patients are actually smokers. And then of our non-cancer patients, none were smokers. So then you start thinking, hey, maybe there's an association between smoking and cancer, and you can do a statistical test to define that association. And that's a case control study, good for looking at rare diseases. A cohort study is a more common study design we see in orthopedics where you're looking at, well, I'll use an example of a prospective study. Let's just say that you want to know if, well, let's say you don't want to know if smoking causes cancer. So you have this group on the top, the smokers, you have a group at time zero of non-smokers, and you follow them over time, and you see that three of the five smokers get cancer and only one of the five non-smokers get cancer. So in this prospective cohort, we'll do an analysis, and you'll suggest that there's probably an association between the exposure, smoking, and the disease, cancer. That's a cohort study. Those are observational studies. An experimental study is when you're doing an intervention, and the gold standard for an experimental study is a randomized clinical trial. And I told you why. It's because we take all of the, you know, we have our biases, and I'll use an example here of, let's say, who gets arthritis after ACL surgery. What we know, we know meniscus, obesity, activity level, age, all those things, but there are also probably a million things we don't know about, genetic predisposition, who knows what. So what we do is we take our population here, these patients are all ACL deficient, and we want to compare two treatments, let's say hamstring versus patellar tendon. And you randomly assign these patients to one treatment, hamstring and patellar tendon. Why? Because all those known and unknown variables that may influence the outcome are now controlled for each other, and the only thing you're truly going to find a difference is, hopefully, is in the effect of your intervention, which is an ACL reconstruction. So you do your research, and guess what, you know, your hamstrings, one out of five has arthritis, and your patellar tendon, three out of five has arthritis, so maybe you could say patellar tendon grafts are associated with an increased risk of arthritis. And that would be a really good trial, because it's randomized, takes all the biases out of treatment allocation. Moving on to studies that evaluate the accuracy of diagnostic tests. So what's the value of a diagnostic test? Think about the natural history of disease, going from the left side of the screen to the right side of the screen. There's initiation of disease, an injury in sports, or it could be a gene mutation, and then it leads to detectable disease and maybe ultimately death. The value of a screening test is it actually allows you to identify the disease before it becomes clinically relevant, so you can actually, hopefully, intervene. And that's what testing is all about, or screening. So in order to, well, a diagnostic tool is, again, meant to discern those who have disease from those that don't have disease, but in order to understand that, you have to understand how disease is distributed in the human populations. Things are either bimodal or unimodal. Here's an example of a bimodal distribution. We all work in medical facilities. You have to get PPD or BCG tests for tuberculosis. The spike on the left are patients who have never seen the antigen before, and they've been exposed, and therefore, they have no induration. And this is in millimeters. And that population on the right is patients who have, or people who have seen the antigen before been exposed and have a big induration. And this is really clear cut. It's not very difficult to discern one population from the other. Here's an example from sports medicine. 1985, Dale Daniel publishes on that the first knee arthrometer trying to screen for ACL tears. And you can superimpose. This is millimeters of difference side to side. This group, and you can superimpose this spike down in here, and you'll see you're actually looking at two very different populations. The group that had an ACL normal knee had no, pretty much no side to side difference in translation, but the patients who had ACL insufficiency, their distribution is spread down to the right side, so they had greater translation. So then you can apply some tests here, and ultimately said if you have more than three millimeters of side to side difference, we're pretty confident that you had an ACL tear. So that's an example in sports medicine. How about if it's unimodal? This is systolic blood pressure. It's really hard to take this curve and decide if someone's hypertensive or not. So then you have to do that. You can do it statistically, but ideally it should be biologically driven, and that's not something you need to worry about. So this concept of something, you'll be tested, and I'll tell you why in a minute. So we take a test, an MRI, the O'Brien's test, and you want to know how good is that at identifying disease. So there's three ways we measure that. How good is your test? How valid is your test? We measure that by sensitivity and specificity. People have been talking about that all week, and I'll tell you exactly what they mean by that. Should we use your test? What's its efficacy? How valuable is it to me in predicting? And then how reliable is your test? Would we get the same answer if we both did the test or if a subject does it on day one and then day 10, is it a reproducible examination? So validity is the ability to distinguish those who have disease from those who don't have disease. And I just want you to look at this two-by-two table here. In order to know if someone has disease, there has to be a gold standard. So let's just say biopsy is a gold standard for cancer. So we biopsy a bunch of patients. This is a two-by-two table. These patients in this column, the disease gold standard is present, so a positive biopsy. These are people who have disease. And these are biopsy negative. So these don't have disease. So patients in this corner here had a positive test result. So let's see, we're doing a blood test for cancer. They're a true positive because we know they have a disease from the biopsy and they test positive. This corner is patients who had a negative biopsy, so we know they don't have disease. And whatever your blood test is, they also tested negative. And that's a true negative. And then we can have false positive and false negative. So that's how you determine sensitivity and specificity. And I'll take you through a few examples because every now and then on these tests, they give you some data and ask you to calculate the sensitivity and specificity, and it's really easy. So sensitivity is the ability to detect true positives. That is, the probability that a person has the disease, who has the disease, is detected as positive. And specificity is the ability to detect true negatives, the probability that someone who does not have the disease tests negative, as you'd expect. And there's that two-by-two table again. So sensitivity is true positives. So of all the patients who truly have the disease, those are those positive biopsies, for example, there's these people. So of everyone that has the disease, A plus C, how many tested positive? A. So the sensitivity is A over A plus C. The specificity, negative in health, those who don't have the disease, who test negatively, are of all the people who don't have the disease, B plus D, how many tested negative? So D. So it's the D cell over B plus D is your specificity. And I'll hammer it home with an example, because I do ask this every now and then. So how good is MRI in detecting ACL tears? And let's say the gold standard is arthroscopy. And let's just say you're 100% confident you can tell an ACL tear at arthroscopy. Here's some data. It comes from the literature out of Boston. And what's the sensitivity and specificity for an MRI at diagnosing an ACL tear? These are real numbers. So remember, the sensitivity is of all the patients who have the gold standard, the arthroscopic findings, of all the patients who have a known ACL tear, so this column, 421, how many actually tested positive? 394. So 394 over 421 is the sensitivity. And that's 94%, pretty darn good. That means you're going to pick up 94% of your ACL tears on MRI. And sensitivity, you know how to do it. So that's sensitivity and specificity, how reliable it is, or I'm sorry, the validity of it. And then efficacy, how valuable is it in my practice? We measure that by the positive or negative predictive value. And I'll use the same squares and the same data. But look here, please. Of all the patients who test positive, so we're evaluating a test, who tests positive, A's and B's? So of all the people that test positive, A plus B, how many have the disease? Those are the A's. So A over A plus B is your positive predictive value, and the negative is the corollary. So we're looking at the MRI data again. So for everyone, so what's the positive predictive value of an ACL? A, 394 over A plus B, 426. So the positive predictive value is 92%. What does that mean? That if you have a positive MRI, you can be about 92% confident that the MRI is true, I would say. So that's pretty good positive predictive value. I think most likely, they just ask you to calculate sensitivity, maybe specificity. But if you know one, you can get the other. And they'll give you all the numbers you need. And then how reliable is your test? How repeatable is it? And there's intra-observer variation. That means if I look at an MRI today and I look at an MRI next week, do I have an agreement within myself? And then there's inter-observer agreement. That means if I look at an MRI and you look at an MRI, how often do we agree? You don't need to know the statistic, but it's called the kappa statistic. You should be aware of what it is. You don't know how, you certainly don't know how to calculate. You wouldn't be expected to do that. But basically, it's an assessment mathematically of the agreement between two people that would be expected by chance alone versus what's actually happening in real life. But this is how you interpret it. 75% or 0.75% or greater is considered excellent agreement. That means we agree on something 75% of the time greater than chance alone. So that would be a very, very strong agreement. So we'll shift gears to epidemiology. These two terms, one of these two will probably be on the examination and it's really, really simple. Here we're talking about a disease and how it's distributed in population. So it's the prevalence and the incidence. And think of the prevalence as simply a snapshot in time. So proportion of individuals with a population in a population that have a disease of interest. Prevalence of obesity in American adults in 2001 was 21%. So you take a snapshot in time and you determine your prevalence, the disease burden. Incidence differs because incidence is a rate. It has a time variable. It's not a snapshot. So incidence is usually describing a proportion or the rate of new cases of a disease within a specific time interval. So for example, if you want to know the incidence of ACL tears in skiers, you park yourself at the bottom of a ski hill. You count the number of ACL tears over a given day or a given season. And then we can find there's incidence of ACL tears. And this skiing population was 56 injuries over 500 people. And I only studied them for half a year. So that's 0.22 injuries per person year. But the point is an incidence has time as a factor. It's a rate. That's the difference between those two. So how do we measure the association between an exposure, skiing, and the risk of a disease, ACL tear? We do it by relative risk and odds ratios. And essentially, think of them as the same thing. So a relative risk is the incidence of outcome and exposed, so the incidence of an ACL tear in a skier over the incidence of an ACL tear in a non-skier, for example. So what we're trying to say is what's the relative risk of inducing an ACL tear with the exposure of skiing? It's a ratio. And how do we interpret a ratio? Well, if a ratio is 1, the numerator and the denominator are the same, then there's no risk. There's no difference between skiers and non-skiers. If the ratio is greater than 1, then there's a positive association. There's an increased risk of ACL tears and those who are exposed to skiing. And if it's less than 1, it's whatever the intervention is. It was protective. Milk reduces fractures, something like that. And I told you relative risks and odd ratios are essentially the same, and they are. The only difference in the odds ratio, you use that in a case control study because it's a retrospective study. You don't have a time variable. So just don't worry about that. They're both reported. I'll go back one slide. They're reported the same way. 1, no association. Greater than 1, a positive association. Or less than 1, a negative association. OK. Moving forward to evidence in orthopedic literature. I think we all are familiar with what evidence-based medicine is, not the application of the current best evidence and making decisions in our practices for our patients. And it categorizes different types of evidence, and then it ranks them based on the strength of their freedom from biases. And let's go through what that means. So for JBJS, not a bad journal, this is their level of evidence. The highest level of evidence would be for a high-quality, randomized controlled trial. We discuss why that is. A randomized trial introduced very little bias into a study. And then we go down. Our literature, again, is full of a lot of case series, not great evidence. The only thing worse than that is just my opinion. I wrote an opinion piece. That's level 5 evidence. And this is important enough that I'll give you an example. Let's just say you want to do the 10,000th research paper on hamstring versus patellar tendon autographs. And this is how this would play out in terms of level of evidence. So if you wanted to do a level 1 trial, you'd have to do a good-quality randomized trial. And that means patients come into your office, and you randomly assign them. Either open the envelope, they either get hamstrings or patellar tendon. That's a pretty good study. It's been done before. Level 2 would be a prospective comparative study. So we'll start today on this study. We're not going to randomly or systematically assign someone to treatment because, I don't know, maybe you would say that's unethical or for whatever reason that may be. But you're collecting good-quality, prospective data over time, and you're going to ultimately compare how your hamstrings do versus your patellar tendons. That's pretty good. That's level 2. Level 3 is doing the same study, but instead of starting it today, you're looking back over the last 10 years of your practice at patellar tendon versus hamstrings. That introduces a little more bias because maybe your treatment allocation was different 10 years ago than it is now, or maybe your fixation methods were different. Level 4, again, a lot of this in the literature, it's not comparative. It's simply you're reporting on a cohort of patients. You report on your hamstring patients or you report on your patellar tendon patients, but not both. It's not comparative. If it were comparative, it would be two or three. And then lastly is expert opinion, is what graft do you like to use and why? It's not data-driven. A lot of bias there. And there's some things that simply have to be expert opinion because we don't have data on them. OK. So this is the part that people start cringing, but I'm going to make this very, very simple. There'll be a few tests on statistical analysis, and I'll make this as clear as possible and as easy as possible, I promise you. There's two types of approaches to statistical analysis. And really, I want to take a step back and say, when we talk about statistics, we're either describing a population, what's the mean, range, things like that, and I'll just talk about that. Or we're seeing if there's a difference between two populations, hamstring versus patellar tendon, and we're trying to discern statistically if there's a difference between those two populations and if the difference is greater than what we'd expect from just chance alone of them being different. And that's a statistical test. So let's talk about descriptive statistics alone. If you capture enough people and describe them, if your population is big enough, it'll start looking like a bell-shaped curve. And you describe a population by mean, median, and mode. And that kind of describes the distribution. So I'll make it very clear, so I'm not going to assume anything. There are 100 people that registered for this course. Let's say we tell everyone to line up against this wall from the youngest to oldest. I think you know what the mean is. That's the average. We add up everyone's age. We divide it by 100. And the mean age in this group is 50-something. So that would be the average. The mean is the central value. So we line everyone up again from age, from youngest to oldest. And we go to person number 50. And we say, what's your age? And they could be 49. Why? Because our population may be skewed younger or older. So the mean and the median aren't always the same. As you can see, there's a skew on the bottom right graph. And then the mode, they love asking about the mode because it's something we don't think about. The mode is simply the most repeated value. So again, we line everyone up. And let's just say there's 17 42-year-olds. And that's the most common age of our population. So the mode is, whatever I said, 42. So the mode is the most commonly identified value in that population. So that's descriptive statistics. We describe the variability in a population by what's called the standard deviation. And essentially, that tells you how tightly distributed your group is or how widely distributed it is. A nice bell curve can either be a tight bell curve or a wide bell curve. But the most interesting thing about standard deviation is if you start at the mean, which here is 0. This is normalized data. And you go two standard deviations this way and two standard deviations the other way. So if you capture the area under the curve of those two standard deviations in either direction, that represents 95% of your population. And that's the basis of most of our statistical tests. How likely is someone, a sampling, going to fall within that range 95% of the time or outside that range 5% of the time? And this is how we actually describe that. We describe it in confidence intervals. And here's an example. I'll give an example. Let's say we have 100 people in the room. We ask everyone to take out their wallet and shake out their cash. And we measure everyone's cash. We get mean and things like that. So let's say the average of our group cash is $40. The confidence, the 95% confidence interval here is described as 37.33 to 42.67. That means 95% of the people in this room are carrying between 37 and $42. Where it really becomes valuable is you take a subsample or look at what we think is a similar population. Maybe next year's group comes in. Or if I take a big enough subsample, I can say with confidence that the first two rows, I'm pretty confident, I'm actually 95% confident that their mean wallet content is between 37 and $42. And that's the basis for making predictions and statistics. You're trying to identify outliers. And that's how you define a confidence interval. Okay, we're getting really close to the end here. When we're applying a statistical test, you do not have to know how to do a statistical test, how to calculate a statistical test, but you may need to know which one to apply. And I'll take you through a very, very, very easy algorithm so you can get that right. But you also have to understand what kind of data you're dealing with. So data is either continuous or categorical or discrete. Continuous data is something that you can measure. Height, weight, your score on a test, you know, it can go from zero to a million. It's an infinite scale. Categorical or discrete data is yes or no. The outer bridge score can either be one, two, three, or four, but it can't be anything more than that. So that's discrete data. So you have continuous variable or discrete variable. And data can only be one of those two things. And then they'll want to know what's the nature of the data. Basically, is your data normally distributed? Is it a bell-shaped curve? Is it a big population or is it not? And for these tests, they always, always, always tell you that your data is a bell-shaped curve. And I'll tell you why that's important in a minute. And then the question always is what test to apply. So this is how we do it. You don't even have to know what a t-test, chi-square, ANOVA, paired t-test, you just don't need to know what those things are. You have to identify the following. So what test should I use? Is your data continuous or discrete? And that'll take you down one of the two limbs. Is it normally distributed bell-shaped curve? Yes or no. And they'll tell you that. And then how many groups are you comparing? Is it one group, kind of before and after a weight loss program? You do this one. If it's two groups, hamstring, patellar tendons, you do this one. And if you're comparing more than two things, you do this one. So that's all you need to know and you can peek this right before the test and you'll get it right. So here's an example of how this test works. So what analytical techniques, so what statistical tests would be used to test the hypothesis of equal weight in two groups tested in the laboratory? And they tell you the data are distributed normally in each group. So it's a bell-shaped curve. So basically, we're describing weight. Is that continuous or discrete? All right, so it's continuous, weight. So we come down here, continuous. Is it normally distributed? Yes, because they told us it was. And then how many groups? We're comparing two things. So guess what? We're gonna do a t-test. In this case, a student's t-test, okay? A paired t-test would be the other one. Here's an example of another thing. You're testing the stiffness of four reconstructions in the laboratory. You use a large number of cadavers and the data is normally distributed. It always is, it's a bell-shaped curve. So what is stiffness, discrete or continuous? Stiffness is a continuous variable. It can take any number. And then we're comparing four things. So it's a continuous variable. It's normally distributed. And we're not comparing one thing or two things. We're comparing four things. So greater than two groups, you wanna do an ANOVA. And you don't even have to know what ANOVA means. You just have to know the answer to that question. And here's a last example here. Two groups of patients have been treated and the outcome is categorized either poor, good, or excellent. Or think of Outerbridge. One, two, three, or four. You wanna test for association between treatment group and the outcome. What test do you use? So what do we do here? We're using discrete data, not continuous data. So you always do a chi-square. That's the easiest one. So discrete data, do a chi-square. And you probably will see a question when they like this where they give you discrete or continuous data to tell you how many groups, whether it's where it's normally distributed. And just remember this and you'll get it right. Where it'll trip you up a little bit, and I've never, ever, ever seen them ask this, is, and I'll go back one slide, if it's not a bell-shaped curve, and you have to go down this road. So there's a number of funky naming tests. So for every one of these tests, that's called parametric tests, there's a non-parametric test that you would do, but please, please, please, don't even think about remembering them. Guess and you might get lucky, but I've never seen that tested before. So I'm gonna close up here on this p-value issue because surprisingly, it's a very common concept, but I want you to understand it very clearly. So in our literature, I think we all know we consider p of less than .05 is statistically significant. Okay, now, we already talked about is it clinically relevant or not? That's different, we know how to deal with that with our MID. What does that mean to be statistically significant? It means that there's less than a 5% chance that the observed difference between these two groups was due to chance alone. So if it wasn't due to chance alone, then it was probably due to the association with your intervention. And that's all it means. So we're basically trying to determine the likelihood of our results are either being, are due to chance alone, less than 5% of the time, or associated with an intervention, 95%. And that's what p of .05 means. That's all it means, it's statistically significant. It's more likely than not that it was due to an intervention or an association than due to chance alone. And if you wanna put yourself to sleep one night, there are a number of resources that you can read. Ed Voidus and Jed Kuhn in the 90s, the AOSSM, charged them with writing a series of primers on statistics for orthopedic surgeons. If you're interested or you can't sleep, have a look at these, they're pretty interesting. And that is the end of the statistics talk. So thank you very much. That'll conclude our course. Thanks for everything. Thank you.
Video Summary
The video is presented by Bruce Miller, a sports medicine surgeon at the University of Michigan. He discusses his approach to evaluating medical literature in a scientific way. He emphasizes the importance of being able to discern the statistical and clinical significance of research findings. Miller covers topics such as determining the clinical relevance of an intervention, evaluating the applicability and external validity of studies, understanding the importance of control groups, and knowing the outcome measures in the literature. He also explains the concepts of power, type 1 and type 2 errors, sensitivity and specificity, relative risk and odds ratios, as well as descriptive statistics and statistical tests. Miller provides examples and practical tips for understanding and applying these principles. The video concludes with Miller recommending additional resources for further reading on statistics in the orthopedic literature. No credits are mentioned.
Asset Caption
Bruce S. Miller, MD, MS
Meta Tag
Author
Bruce S. Miller, MD, MS
Date
August 11, 2019
Title
Sports Medicine Research
Keywords
Bruce Miller
evaluating medical literature
clinical relevance
control groups
outcome measures
statistical tests
orthopedic literature
statistics
×
Please select your language
1
English