false
Home
AOSSM 2022 Annual Meeting Recordings - no CME
Peer Review and the Randomized Controlled Trial: H ...
Peer Review and the Randomized Controlled Trial: Helping Sports Medicine Realize the Level I Hype
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
So, you know, I think the further we get into the era of evidence-based medicine, the easier it is to forget how important that was to improving clinical care. And one of the cornerstones of the transition to evidence-based medicine is the randomized control trial. And so, a lot of hype and a lot of it's appropriate with randomized control trials, right? Evidence-based study design can produce great results, but not all RCTs are equal, and it's still important that we critically evaluate them, and it's important so that you make sure the results actually live up to the hype, because not all the RCTs are really level one. So this talk is not going to be about sort of generic review. You know, I think everyone in this room is sort of comfortable with review. The point here is really to try to get in the weeds on a few issues that are somewhat specific to the review of these RCTs. So we're going to talk about balancing, table one, and the use of inappropriate P-values. We're going to talk about outcomes, hypothesis testing, and trial registration, within versus between group comparisons and the potential for spin. And then finally, we're going to talk a little bit about the application of machine learning algorithms to randomized control trial data. All right. So randomization does not ensure balancing. If you're going to take one thing away from this talk, that would be it, right? Even though the patients may have been randomized, you got to check and make sure that the randomization was successful, and you ended up with balanced groups. This is a picture actually from the FDA's website. On their website, they're talking about the importance of diversity in trials. Certainly, diversity in trials is important. You know, it increases the generalizability of the trials, you know, easier use. Comes with some issues, though, you know, some known issues. Less than an infinite sample size, there's going to be chance that you won't have balanced groups. You know, the treatment groups could differ. This is going to be more likely to affect smaller diverse groups, right? If you had 20 clones of each other, and you randomized them, the groups will be similar, right? Because everyone was exactly alike to begin with, okay? But in the real world, and especially as we try to have more diversity in trials, right, this is going to be a potential, like this is one of the unfortunate realities of that. Orthopedic RCTs are often smaller, and it's just a fact. And so, orthopedic RCTs may be particularly vulnerable to failing to have balanced groups. All right. P-values in Table 1 do not assess balancing. Really there should, Table 1 from a randomized control trial should never have P-values. Just should not happen. The hypothesis test does not really make sense. If you take a step back and you think about it, like why were you doing the hypothesis test in most of these cases, right? Well, you're interested in knowing, is it likely these two groups could vary like they do if they really came from the same population? In a randomized control trial, we know they came from the same population. They came into the study, and then they were randomized. So from a conceptual standpoint, it really does not make sense to be performing a hypothesis testing there. Additionally, the hypothesis testing is likely to be underpowered. This doubled down, right? So you're using the hypothesis testing to see if the groups were different. It's more likely smaller groups are actually different, but then the hypothesis test is underpowered, so you're like, oh, I guess they're not different. It's exactly the wrong thing to be doing. So do not get confused. Do not think that p-values in Table 1 tell you anything about whether the groups are balanced after randomization. It does not. All right, so you might say, oh, man, this seems like really sort of picky. Is this even an issue? All right, so we systematically reviewed 86 randomized control trials from four leading sort of orthopedic journals, including AJSM. So some of this data is us in this room, and this is contemporary. This is July 2019 to 2020. What proportion of these RCTs do you think inappropriately reported p-values in Table 1? 24 percent, 41 percent, 58 percent, 75 percent. So these are RCTs recent in leading journals. Close. Luckily, we did a little better than that, but still 58 percent. Over half the time, we're putting in things in Table 1 that do not make sense, have the potential to mislead readers, right? So we just, you know, we probably just need to do better. But the first part of doing better is understanding that we shouldn't be doing that. All right, so if you look further at this data, you might say, okay, yeah, we're putting the p-values in, but, you know, the groups were probably balanced, right? They were randomized. You're making too big a deal out of this. Okay, so maybe. So 23 of the randomized control trials had a patient-reported outcome measure as the primary outcome, okay? And then they also reported what those values were at baseline. So I think we would all agree, if it's the primary outcome and it was different between the randomized groups at baseline, then those groups probably weren't perfectly balanced. All right, so if you look down here on the X-axis, this is the number of patients who were in the trial. On the Y-axis is a measure of how different the groups were. It's a standardized mean difference, but it's just there. All right, so here you can plot out statistically, you know, what would you expect randomly the standardized mean difference to be. And so if you have a trial with 64 patients, then you would expect, on average, that trial would have a quarter standard deviation difference between the groups. All right, so here were the 23 studies. Like you'd expect, they sort of follow the line. Some are above it, some are below it, but, you know, it kind of follows. Two takeaways here. First, as we know, many of the trials are small. So many of the trials were in the range where you would have expected there to potentially be differences, even after randomization. Second takeaway, 30 percent differed by more than a quarter of a standard deviation. Like this is a real issue. Three of those reported P values that were over .05. So if you didn't understand, you weren't supposed to be looking at the P value, and you're just reading this, you're like, oh, okay, I guess the groups are similar, right? The P was greater than .05. This is the double down, and it is happening all the time. All right, so then the question is, well, what should you do? How should you assess balancing? All right, so evaluate the difference in the baseline characteristics between the groups. All right, you got to know what is the absolute difference. The absolute difference does matter, but it only is a part of it, right? What is the actual association between that variable and the outcome? Because if that variable that the groups differ on doesn't affect the outcome, then it's not going to confound things. It's not going to be that big a deal. So obviously, this is going to be somewhat subjective, right, because there's a little bit of an intuition. And this is why it's really important that content experts like everyone in this room are the people evaluating these articles, because you're going to have the best sense of what to do. Here's what I do when assessing balancing, right? This is a little bit like surgery in the sense that it's not until you go to explain to somebody how you do it that you realize, wow, there were actually a little bit more going on than I thought. Step one, identify what variables matter, right? Some may have been set by the inclusion or exclusion criteria and are destined to be the same. If not, were they measured and how? Like if there are important variables, were they even measured so that you could know whether the groups were balanced? You can't measure what you can't see. Examine the absolute difference per variable for the variables that matter. Step three, consider the strength of the association for those variables with the outcome. Step four, multiply in your mind somewhat subjectively, somewhat conceptually, how much difference there was between the groups by how much that variable mattered, and then is it a big deal, little deal, no deal, right? Yes, subjective, but it gives you a sense of how to then interpret differences in the outcomes. You may say, okay, well, why are there four more steps, right? Why are there eight steps? It seems like we're done. Step five, check if the difference was acknowledged, right? Did the authors mention this difference? Please check and make sure they didn't include p-values, especially misleading p-values that could confuse some readers into thinking, oh, it wasn't significant, I guess these groups are the same. No, that's, again, irrelevant. Step six, check if anything was done to address the imbalance. Okay, you know, there are things that can be done, statistical adjustments, stratified analyses, there are things that could be done. Step seven, check if any of those adjustments were pre-specified in the registration and how the trial was sort of planned. Step eight, consider if the adjustment was enough, right? Sometimes you may end up with groups that are so imbalanced that trying to statistically adjust may not be enough. Again, a little bit of a subjective. Someone may say, oh, did you, I mean, why step seven, right? Do you really need to go back and check if this was registered? That's going to get us to the second part of this talk, primary outcome hypothesis testing and trial registration, right? What are trials about, right? Are we doing randomized control trials? Because we want to know if injecting some PRP at the end of a meniscal repair is associated with a 3% better chance that it heals versus a 7% better chance that it heals? Or are we interested in doing the trial to know whether or not we should be injecting the PRP? In the vast majority of the trials we're doing, we're not interested in prediction, we're interested in inference. We're interested in taking that data and knowing should we be doing PRP injections at the end of the meniscal repair, yes or no? Okay? So it's about inference. And because it's about inference, hypothesis testing is paramount, right? Because we're going to test a hypothesis and we're going to reject it or not and that's the deal. So the decision to reject the hypothesis is what this whole thing hinges on. So you have to protect type one error. You have to protect the alpha. This is a, you know, Dr. Carey did a very nice job of discussing this in the last lecture, but this can get inflated very quickly. And we're going to go through some examples. All right. The not-so-primary outcome. Right? Everyone's like, oh, yeah, I know what the primary outcome of the study was, right? It was the COOS score. Okay, great. Was it the total score or was it the domain score? Was that specified? Right? Because each of these patient-reported outcome measures quickly devolves into six subscales. Or even something like, oh, we were interested in if patients had complications. Well, which complication, right? Like, a complication can easily devolve into six of its own. Time point. Okay, would you want to know about the COOS score at three months, two years? All of these are different, right? The two groups may do the same at three months and then differ at two years. Those are two different outcomes. Raw score versus a change score. Assume distribution or dichotomize, right? Would you want to know the absolute difference or do you want to know the proportion in each group who got a minimally clinically important difference? Right? Those are two different outcomes. Missing data imputation, right? Did you just analyze the patients who actually had the data or did you analyze those patients plus you just made up some data for the others? Those are two different things. Alpha inflation multiple hypothesis test. This was already covered a little bit so I'm not going to belabor this. But only one test, right? You have one primary outcome and then you get to test that once. And that's how you protect alpha. That's how you keep the type one error rate at .05. So what if you do adjusted versus unadjusted analyses? Okay, well now you've taken that same outcome and you've tested it twice. Or maybe more, right? If you adjust it in one of the models for two variables and the next model you adjust for five, now you've done it three times. Management of baseline scores, right? Did you then do another model where you incorporated those? You've got to make sure it's consistent with the primary outcome and it's consistent with the power analysis, right? You don't want to do a power analysis for one outcome and then it's a different outcome for the primary. All right, so what are we not seeing, right? So we only in these studies, you only see what the authors report, right? But there could have been a bunch of other stuff that was done that behind the scenes inflated the alpha in a way you don't know, right? You think back to the database studies, right? You know, assuming everyone's being like completely forthright and they kept track of what they did, you know what they did. But you don't know how many other investigators have gone through those databases, right? Should we be, you know, you just don't know. You can only see what you, you can only know what you see here. All right, so the trial has to be registered. That's how you keep things transparent and that's how you keep track of what was really the primary outcome, how is it supposed to be tested and avoid the potential that things were tested a million times. Should be consistent, plant hypothesis consistent, power analysis consistent, right? Everything needs to be specific and everything needs to be consistent. You do that, you protect alpha. Inflation is everywhere as we all know too well. So I cannot answer this question, you know, how do we limit inflation without causing a recession, but how do we limit the inflation alpha? We can. Trial registration, critically important. You know, we're very lucky, you know, Dr. Ryder was one of the earliest people to sort of bring this topic to the forefront in sports medicine and orthopedics back in 2012. Wrote a very nice editorial on this and AJSM was one of the first journals to have a policy requiring trial registration. A few years later, some of the other journals followed suit. It is important that everyone does it, right, because if only one journal does it, then you can just take the non-registered trials to the other journals. So it is sort of a team effort here. But even with this, right, is it enough, right? Registering trials is great. This is a nice article from 2012. I think it remains relevant. I don't think there's been really any change in this data, sadly, but they looked back at basically randomized controlled trials that were registered and had been completed for at least three years. Forty percent had not been published yet. Eighty percent deviated from the registration. So when they read the article, it didn't match up with how it was registered. I mean, this is just the reality. Like it is humbling and it's a little sad, but this is where we are in 2022. Can we do better? I don't know. Maybe. This is maybe part of the talk that's maybe a little bit controversial, but, you know, is anonymity a barrier to the review process, right? For randomized controlled trials, should we unblind reviewers so that they can check trial registrations and there's other sets of eyes looking at those things? I don't know. Maybe. But that doesn't solve it. Who's checking for unpublished research, right? Who does that? Like, no one really. And, you know, that's one of the challenging things in orthopedics, right? The FDA is not as heavily involved in some of our stuff, right? It's not like it's a new beta blocker and you can't prescribe the beta blocker until it's approved by the FDA, right? You know, so I'm not really sure how we solve that. All right. This gets to the next issue about spin. Okay. Randomized controlled trials, like why are we using this? Why are we using them, right? It's for inference. It's to compare two treatments. Okay. You estimate a relative treatment effect. That's the whole point, is how did one group do relative to the other? It's not how each group did on their own. It's how the groups did relative to one another. All right. So, you've got to think about, well, what was the comparison group? You know, there are a lot of randomized controlled trials done where there's really two experimental treatments and that's okay. But that trial is going to tell you how those treatments do relative to one another. They don't tell you really anything about how those treatments do relative to treatments that weren't in the trial. And that is an important distinction. And this is why we focused so much on balance and protecting the alpha, right? Limiting the type one error rate. It's all about protecting this between group comparison. All right. You might say, yeah, but wait, isn't an RCT really actually just two simultaneous cohorts, right? Can't I interpret it that way if I want to? Maybe. I mean, yes, technically that is true, right? You did take some patients and then you did something and then you followed them. And so, you could ignore the fact that it was an RCT. I would be cautious. RCTs often have stricter inclusion criteria, right? So, this data may not be as generalizable as you thought it was going to be. Hawthorne effect, right? When people are in a trial, they're studied much more closely than they are in real life. Their treatment may be much more protocolized than it would be in real life. So, it may not mimic cohort data as well. And then cost often limits follow-up, right? So, these studies are going to usually be smaller and have more limited follow-up than some, like, cohort studies would have. So, the best takeaway here is if you choose to go down this route of interpreting some of the patients from a randomized controlled trial as a cohort, this is no longer level one evidence, right? This is essentially a cohort, and in many cases, I would argue, not a great cohort. All right, so haven't I and other people advocated that we should do a bunch of extra analyses with RCTs, right? This is confusing. You're making such a big deal out of one outcome, one hypothesis test, but then there's all these people talking about doing tons of extra analyses. It's true, right? You do want to make the most of the data that you can, okay? But you got to distinguish the primary report from all of the secondary reports. Only the primary report's level one. Everything else could be good. It could be useful. It could be hypothesis generating, but that is not the primary outcome. That is not the level one evidence emerging from that randomized controlled trial. All right, so this gets us into the last topic, machine learning algorithms and RCT data, right? This is a hot topic. Everyone's trying to figure out, well, when do we use machine learning? I'm not going to be able to answer that for you. You know, is this going to turn out to have been revolutionary for orthopedics? I do not know. You know, RCTs, high-quality data in general perspective, usually has a clinically important outcome, right? Why wouldn't we apply machine learning algorithms? And there's probably going to be some benefits to doing that. I would just, I'm going to go over a few questions to consider, a few cautious. One, is the sample size large enough? Right? We talked about RCTs tend to be small. Like, it's just the reality of doing them. They're expensive, right? And so, they're designed to detect a difference in a primary outcome. They're designed for inference, not prediction. Right? Machine learning algorithms are sort of the opposite, right? They capitalize on having an immense amount of data. And so, it's not clear to me that the sample size in most RCTs is going to line up with what's necessary for these algorithms to work well. So, I would just be cautious, first, on the sample size. Next, is the population heterogeneous enough? Right? One of the ways randomized control trials become more efficient, limit costs, is by sort of strict inclusion and exclusion criteria, trying to minimize the variability going in. Okay? But then, when you think about building a prediction model, well, you want a bunch of variation, so that it can predict other variation. And if you didn't start with very much variation, it's going to be awfully tough to then use that data to develop some kind of really great prediction model. You know, and then, if you, let's say you do. Let's say you develop a model and it has pretty good predictive ability, you think, but it's from a randomized control trial. Well, if you then take that into the real world, where the patients are a lot different, you might see that your model doesn't perform so well. So, it would definitely be critical and would even more so want to see sort of external data used to assess the generalized ability and the performance of those models. And then, finally, this is unclear to me. What do you do about the treatment? Right? Let's say the treatment in the trial worked, and then you're using this machine learning to build a predictive model. You know, do you only include the treated group? Also, what if, in the real world, you know, selection bias to getting treated is actually an important predictor of the outcome? Like the RCT data is never going to be able to incorporate that because people got randomly selected for the treatment. They didn't sort of self-select into it with the help of their surgeon. And then, you know, treatment affects the outcome, you know, should models then be stratified by the treatment? You know, a lot that's unclear here. All right. So just to recap, assess balancing after randomization, do not use p-values. Right? Randomized control trial, you're going to randomize patients into two treatment groups. Think about it. Just make sure those groups look similar, look comparable after the randomization because it is not a guarantee that they will be, especially for smaller trials. Respect the primary outcome. We want to avoid inflation of alpha, avoid inflation of the type one error rate. You do that by being consistent and specific. Right? The primary outcome should be as specific as possible. Focus on between group comparisons with RCTs. Right? That is the point of the RCT. It's not to say, oh, both groups do great. You could do a cohort study for that and have more generalizable, you know, group of patients in it. The point of the RCT is for between group differences. And then finally, you know, it's important that we make the most of the RCT data, but, you know, we should probably be careful. All right. This talk did not discuss a lot of stuff, you know, different trial designs, right, non-inferiority trials. I would say this is not super common yet in orthopedics. Of those 86 studies we reviewed, 85 were superiority sort of traditional RCTs. In the future, that may be a bigger issue, cluster randomized control trials, things like that. But we didn't discuss strategies for multiple primary outcomes. Dr. Carey had started to get into this a little bit, but, you know, FDA has issued guidance on this. Some pharmaceutical companies are doing this where you can actually set up the trial in advance to have multiple primary outcomes. That's not really happening yet in orthopedics, but may at some point. Treatment heterogeneity, right, does the treatment work as well for men and women? Does it work as well for old and young? You know, we didn't talk about that, but that's an increasingly hot topic and important to consider. In the mediation analysis, like why do treatments work? You know, that's something that you can explore using RCT data. All right, thank you. Yes, please. Can you expand a little bit on that concept of the secondary outcomes are not level one data? It seems like if you have 80% follow-up, it was pre-registered, you got a statistically significant secondary outcome in a prospective trial. Doesn't that, by definition, make it level one? No. So the issue is, right, what if you had 20 secondary outcomes, and then if you go back to Dr. Carey's example, there's a 64% chance that one of those 20 is going to be statistically significant. Do you think you would adjust for it with the... Well, yes. So that you could, but everything would have to be pre-specified. And so what you can do is you can say, and then I would say they're not secondary outcomes. I would say that's an example where you have multiple primary outcomes. And you can do that, and you can keep your alpha, your type one error rate at .05. There's a couple ways to do it. One, so you could say, listen, we're going to have two primary outcomes, and we're going to test each at .025. Or you could say, listen, we're going to have two primary outcomes, we're going to test outcome A first, and then if and only if outcome A is statistically significant, then we'll move on to outcome B. It's like gatekeeping. There are novel techniques that are being actively worked on to do that, to try to like make the most of this data, but if it wasn't pre-specified and somebody just said, hey, you've got this one primary outcome, this is how we're going to test it, this is what we're doing, and then we've got all these other things, the other things are not the level one evidence. They're interesting, they're hypothesis generating, they're add-on. You can't just publish a trial with one, you know, like, I mean, there's too much data there, right? Like, you kind of want to pull as much out of it as you can. You do, for sure, for sure. You do want to pull as much out of it as you can, but again, you've got to remember, what's the point of doing the trial, right? It's, in my opinion, largely for inference, is you're trying to make a decision, are we going to start injecting all the patients we do a meniscal repair with PRP, yes or no? And so you've got to make a decision, and, you know, usually the way we set the trials up is we make the decision based on the primary outcome. Again, I know that that seems like a pretty rigid stance. A lot of this is, you know, based on, and it may be purposeful and may be often not purposeful, malpractice or poor use of research in the past, but you look at FDA, right? Like these pharmaceutical companies and FDA, they go back and forth, like, what's going to be the outcome? How are we, what's going to define whether this drug gets approved or not? And then that's it. And if those companies could then say, oh, yeah, the primary one didn't work, but look at all these secondary outcomes that were better, we'd have a lot more drugs on the market. Yeah. That's a great question, though. Yes, please. So, appreciate it. This is very helpful. And this is all about randomized clinical trials. We still get asked on occasion to review retrospective cohorts, any bullet points or thoughts on what's going to make that type of paper valuable? Yeah, I would echo some of what Dr. Carey said, right? So there are certain things that aren't going to be studied with a randomized controlled trial, right? And the outcome's really rare, like an infection after a procedure that was after a steroid injection. Like, that's so rare. How would we ever randomize enough people to study that? You know, also some things you're not going to want to randomize patients to. And so I think those are, can be well studied with a retrospective cohort. But again, I think, you know, some questions you're never going to get true level one evidence for. So the best you're going to have is a retrospective cohort. But if it is a question that could be studied and would be appropriate for an RCT, then yeah, I think you are less enthusiastic about the retrospective study. And you say, hey, yeah, this is interesting. This is the data that you would use to then justify the RCT. But you know, again, there are examples of things where you're just not going to be able to study those prospectively. Yes. One more question. Sure. Is there anything done at AJSM to verify the trials versus the registration, because this does come up. Yeah, we. Especially for international studies, right, where they may be registered in a database we don't have access to. Like, at least in clinicaltrials.gov, you can kind of keyword search it and maybe find it. Is there any editorial level connection between them? Yes. All of them are being checked internally against their trial registration. And again, and that's the other reason, like, even though that article is from 2012, it is still common. And again, I think, you know, in the medical world with beta blockers or whatever, these statins and all this stuff, like, you know, I think sometimes maybe people were purposely trying to, like, you know, play fast and loose. I think in orthopedics most of the time when the report doesn't match up with the registration, it's usually that, like, somebody's fellow five years ago registered the trial and then the new fellow wrote the study up and, like, at some point along the way people forget. I don't think it's usually intentional. But it's still important that we try to keep things as specific and consistent as we can. Yes, please. Yes, so basically those are, like, so somebody does a randomized control trial and they're writing up the results and then in the table one of the study they'll say, oh, these were the characteristics of the group that got the PRP, these are the characteristics of the group that didn't get the PRP, and then here's a p-value comparing those characteristics between the groups. That p-value is meaningless because what matters is what was the absolute difference between the groups, not whether it was statistically significant. Because maybe it was a small trial and maybe the groups were really different, but those differences didn't quite reach statistical significance. They're still different. The groups are still different. The p-value is meaningless. Also, we know for sure that the patients who got the PRP and the patients who didn't came from the same population because we randomized them from that population. So the hypothesis testing, the hypothesis itself doesn't really make sense. Yes. So in the absence of a p-value in table one, doing absolute differences, what objective way do you have to decide what absolute difference is important or not? Because it seems to be potentially subjective. It is definitely subjective, unfortunately, but I think, you know, one thing that is nice about orthopedics and a lot of clinical specialties is we have a sense, or at least we think we do, of what variables are predictive of the outcome. So we can then look, we should then be measuring those so that we can look to see if the groups are balanced. And then because we know what we think is predictive of the outcome, we can then, it is subjective, but it doesn't mean it's wrong, right? We can assess how big a deal it is, right? So let's say one group, it turns out you randomized patients and one group is, it ends up predominantly professional athletes and the other group ends up predominantly high school athletes. Well, if it's, you know, return to sport, I don't know, maybe that matters a little bit, right? Whereas if it's some other outcome, maybe it doesn't. And so, again, that is subjective, but you still have to do it. Yes. We'll have a question from Dr. Rabin. Ask them to remove the p-values from Table 1, yeah, because again, a lot of readers are going to know they should ignore those, but there are some who may not, and then they may be misled, and they may be misled into thinking, oh, the groups were the same because the p-values weren't statistically significant when, in fact, the groups were different. And so, yeah, just tell, you just say, hey, listen, doesn't really make sense why you have p-values in Table 1. I would comment, would recommend removing. Yes, please. No, related to that, if you have Table 1 p-values, every now and then there'll be a study that actually does have a significant difference. I think that's really important to know. So, that to me is a value of having p-values there. Yeah, I would just, I think the important thing was the absolute difference between the groups, and whether that absolute difference was, I don't think knowing it was statistically significant or not adds a lot beyond knowing the absolute difference. Because if the absolute difference is legitimate, and you think that the fact that these groups differed in that way could affect how their outcomes differ, then it, I don't see why knowing it's statistically significant or not would matter a lot. Yes, please. You said assessing the potentially imbalanced factor ought to get its effect on the primary outcome, but then you also said that that's largely subjective. How do you assess that without a statistical test? And then doesn't that run into the same problems that you were just talking about earlier? Yeah, it is subjective for sure, right? And there's not going to be, you know, we randomize patients because we want the groups to be, you know, similar. But when you don't have an infinite population you're randomizing, the groups may not be. You know, you're going to get unlucky sometimes. And especially in smaller trials, there is a reasonable chance you may get unlucky. And then, yeah, each, you know, if it's a variable that doesn't affect the outcome, it doesn't matter if the groups differ. But the only way you can check for that is with a statistical test. No, no, no, no. You look at the absolute difference. So you say, listen, one of these groups, just by chance, the average age was 38. The other group, the average age was 22. I think age affects this outcome of patellar redislocation. So, therefore, I'm going to have to be critical in assessing the outcome because I think the groups weren't balanced. And I think that imbalance between the groups may be what's driving the difference in the outcome, not the treatment. But that sounds more like a hypothesis than a conclusion, right? Like I think that the imbalance in the age group is affecting... Yeah, it is. It is. Yes. Okay. Yeah. And again, that's why you check to see, you know, did the authors acknowledge this difference and what were their thoughts on it, right? Did they then try to do adjustments for it, right? Especially for a smaller trial. I mean, this is a little bit beyond this talk. But for smaller trials, it's increasingly done that the plan pre-specifies that you're going to include these variables. Also, if you know you're doing a small trial and you know the variable is strongly associated with the outcome, you need to stratify the randomization by the variable. And then you're guaranteed to get the group similar. So usually, in a well-done trial that was pretty well thought out, this is probably not going to be a big issue all the time. But it should still be checked for. Yes. In that example, if you found the difference, the one group was older than the other, could you encourage the authors then to stratify by age and their comparisons then? For sure. That's definitely a way that you could look at it. And it's optimal if that was pre-specified. Because if it's not, then you get into this issue of like, okay, well, now we've tested one hypothesis three different times. So we actually have done three hypothesis tests. And that's where it can be helpful to pre-specify. And some trials will say, listen, we know there's a chance the groups are not going to be balanced. If the groups differ on one of these factors by more than a half a standard deviation, then we're going to do a model and include that variable in it. And that's how they keep the one outcome, one hypothesis test. Yes. So for reviewers who are trying to decide whether we reject a paper, revise it to be revised or accepted, and you see a paper that has a compelling outcome and there are randomization issues, and they address those randomization issues, but yet it's still not perfectly randomized or balanced, what would you say? Is that appropriate for publication in HHS or elsewhere, or? I think it depends. But yeah, I mean, like, no study's going to be perfect, right? Like, you know, I think we want to recognize the imperfections in the study. But just because there are small imperfections doesn't mean something shouldn't be published. You know, but it does mean they should be noted, right? Because, you know, and then the authors have a chance in the discussion to mention it and say, hey, listen, you know, this is sort of associated with the outcome. It's not that strongly associated with the outcome. You know, we then did some secondary analyses that didn't really support things would have changed. And then, you know, buyer beware. But if you don't point it out, then there's no chance to know. Awesome. Thank you very much. Thank you, David. Both these presentations have been great. You know, I started doing these 20 years ago because I realized there's so much that I didn't know, and I wanted to learn more. And I thought, well, maybe our reviewers don't know more than me, so maybe they want to learn these things, too. And this is why we have these associate editors that amplify the knowledge that we have when we evaluate the papers. I see a number of our others here, Dr. Ganley, Dr. Fleming, Dr. Foster. I think Dr. Washer is around, too. And they all bring something special to the journal, and you do, too. So I want to thank you. So many of you are reviewers, and you came here today, and it's great to see this room full of people that want to learn to even be better. We have so many wonderful reviewers, and yet you want to get better, which is fantastic. And we want to do whatever we can to help you, and that's why we have these seminars. And if there's some people that aren't reviewers and would like to be, Donna Tilton in front here, who's wearing a red blouse, who is known to many of our reviewers, will be happy to sign you up. So just talk to Donna afterwards, give her your card, and we'll get you reviewing as soon as possible. And before I go, I also want to point out that we're really honored to have the president of the American Academy of Orthopedic Surgeons, none other than Félix Baudis-Savoy here, which is really a special honor for us. So thank you, buddy, for coming. And thank you all for coming. Have a wonderful day. And if you're on the editorial board, we'll be regathering at 1.30 here.
Video Summary
The speaker begins by discussing the importance of evidence-based medicine and how randomized control trials (RCTs) are a cornerstone of this approach. However, not all RCTs are equal, and it is important to critically evaluate them to ensure the results live up to the hype. The speaker then delves into specific issues related to the review of RCTs, including balancing, table one, and the use of inappropriate p-values. They emphasize that randomization does not ensure balancing, and it is necessary to check if the randomization was successful and the groups ended up balanced. They also explain that p-values in table one do not assess balancing and should not be used as indicators of group similarity. The speaker highlights the importance of maintaining the integrity of the primary outcome and avoiding inflation of alpha through multiple hypothesis testing. They also discuss the potential application of machine learning algorithms to RCT data, noting that caution should be exercised regarding sample size, population heterogeneity, and treatment effects. The speaker concludes by noting that while the talk did not cover all aspects of RCTs, such as non-inferiority trials and treatment heterogeneity, it is important to make the most of RCT data while being mindful of its limitations. The summary is based on the transcript of a video lecture, and no specific credits were mentioned.
Asset Caption
David C. Landy, MD, PhD
Keywords
evidence-based medicine
randomized control trials
critical evaluation
group balancing
primary outcome
machine learning algorithms
limitations
×
Please select your language
1
English