false
Catalog
Peer Review and the Randomized Controlled Trial: H ...
Peer Review and the Randomized Controlled Trial: H ...
Peer Review and the Randomized Controlled Trial: Helping Sports Medicine Realize the Level I Hype
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
So, you know, I think the further we get into the era of evidence-based medicine, the easier it is to forget how important that was to improving clinical care. And one of the cornerstones of the transition to evidence-based medicine is the randomized control trial. And so, a lot of hype and a lot of it's appropriate with randomized control trials, right? Case study design can produce great results, but not all RCTs are equal, and it's still important that we critically evaluate them, and it's important so that you make sure the results actually live up to the hype, because not all the RCTs are really level one. So this talk is not going to be about sort of generic review, you know, I think everyone in this room is sort of comfortable with review. The point here is really to try to get in the weeds on a few issues that are somewhat specific to the review of these RCTs. So we're going to talk about balancing table one and the use of inappropriate p-values. We're going to talk about outcomes, hypothesis testing, and trial registration within versus between group comparisons and the potential for spin. And then finally, we're going to talk a little bit about the application of machine learning algorithms to randomized control trial data. All right, so randomization does not ensure balancing. If you're going to take one thing away from this talk, that would be it, right? Even though the patients may have been randomized, you got to check and make sure that the randomization was successful, and you ended up with balanced groups. This is a picture actually from the FDA's website. On their website, they're talking about the importance of diversity in trials. Certainly, diversity in trials is important. You know, it increases the generalizability of the trials, you know, easier use. Comes with some issues, though, you know, some known issues. Less than an infinite sample size, there's going to be chance that you won't have balanced groups. You know, the treatment groups could differ. This is going to be more likely to affect smaller diverse groups, right? If you had 20 clones of each other, and you randomized them, the groups will be similar, right? Because everyone was exactly alike to begin with, okay? But in the real world, and especially as we try to have more diversity in trials, right, this is going to be a potential, like, this is one of the unfortunate realities of that. Orthopedic RCTs are often smaller, and it's just a fact, and so orthopedic RCTs may be particularly vulnerable to failing to have balanced groups. All right, p-values in Table 1 do not assess balancing. Really there should, Table 1 from a randomized control trial should never have p-values. Just should not happen. The hypothesis test does not really make sense. If you take a step back and you think about it, like, why were you doing the hypothesis test in most of these cases, right? Well, you're interested in knowing, is it likely these two groups could vary like they do if they really came from the same population? In a randomized control trial, we know they came from the same population. They came into the study, and then they were randomized. So from a conceptual standpoint, it really does not make sense to be performing a hypothesis testing there. Additionally, the hypothesis testing is likely to be underpowered. This doubled down, right? So you're using the hypothesis testing to see if the groups were different. It's more likely smaller groups are actually different, but then the hypothesis test is underpowered, so you're like, oh, I guess they're not different. It's exactly the wrong thing to be doing. So do not get confused. Do not think that p-values in Table 1 tell you anything about whether the groups are balanced after randomization. It does not. All right. So you might say, oh, man, this seems like really sort of picky. Is this even an issue? All right. So we systematically reviewed 86 randomized control trials from four leading sort of orthopedic journals, including AJSM. So some of this data is us in this room, and this is contemporary. This is July 2019 to 2020. What proportion of these RCTs do you think inappropriately reported p-values in Table 1? 24 percent, 41 percent, 58 percent, 75 percent. So these are RCTs recent in leading journals. Close. Luckily, we did a little better than that, but still 58 percent. Over half the time, we're putting in things in Table 1 that do not make sense, have the potential to mislead readers, right? So we just, you know, we probably just need to do better, but the first part of doing better is understanding that we shouldn't be doing that. All right. So if you look further at this data, you might say, okay, yeah, we're putting the p-values in, but, you know, the groups were probably balanced, right? They were randomized. You're making too big a deal out of this. Okay. So maybe. So 23 of the randomized controlled trials had a patient-reported outcome measure as the primary outcome. Okay. And then they also reported what those values were at baseline. So I think we would all agree, if it's the primary outcome and it was different between the randomized groups at baseline, those groups probably weren't perfectly balanced. All right. So if you look down here on the X-axis, this is the number of patients who are in the trial. On the Y-axis is a measure of how different the groups were. It's a standardized mean difference, but it's just there. All right. So here you can plot out statistically, you know, what would you expect randomly the standardized mean difference to be. And so if you have a trial with 64 patients, then you would expect, on average, that trial would have a quarter standard deviation difference between the groups. All right. So here were the 23 studies. Like you'd expect, they sort of follow the line. Some are above it. Some are below it. But, you know, it kind of follows. Two takeaways here. First, as we know, many of the trials are small. So many of the trials were in the range where you would have expected there to potentially be differences, even after randomization. Second takeaway, 30 percent differed by more than a quarter of a standard deviation. Like this is a real issue. Three of those reported p-values that were over .05. So if you didn't understand, you weren't supposed to be looking at the p-value, and you're just reading this, you're like, oh, okay, I guess the groups are similar, right? The p was greater than .05. This is the double-down. And it is happening all the time. All right. So then the question is, well, what should you do? How should you assess balancing? All right. So evaluate the difference in the baseline characteristics between the groups, right? You got to know what is the absolute difference. The absolute difference does matter, but it only is a part of it, right? What is the actual association between that variable and the outcome? Because if that variable that the groups differ on doesn't affect the outcome, then it's not going to confound things. It's not going to be that big a deal. So obviously, this is going to be somewhat subjective, right, because there's a little bit of an intuition. And this is why it's really important that content experts like everyone in this room are the people evaluating these articles, because you're going to have the best sense of what to do. Here's what I do when assessing balancing, right? This is a little bit like surgery in the sense that it's not until you go to explain to somebody how you do it that you realize, wow, there were actually a little bit more going on than I thought. Step one, identify what variables matter, right? Some may have been set by the inclusion or exclusion criteria and are destined to be the same. If not, were they measured and how? Like if there are important variables, were they even measured so that you could know whether the groups were balanced? You can't measure what you can't see. Examine the absolute difference per variable for the variables that matter. Step three, consider the strength of the association for those variables with the outcome. Step four, multiply in your mind somewhat subjectively, somewhat conceptually, how much difference there was between the groups by how much that variable mattered, and then is it a big deal, little deal, no deal, right? Yes, subjective, but it gives you a sense of how to then interpret differences in the outcomes. You may say, okay, well, why are there four more steps, right? Why are there eight steps? It seems like we're done. Step five, check if the difference was acknowledged, right? Did the authors mention this difference? Please check and make sure they didn't include p-values, especially misleading p-values that could confuse some readers into thinking, oh, it wasn't significant, I guess these groups are the same. No, that's, again, irrelevant. Step six, check if anything was done to address the imbalance. Okay, you know, there are things that can be done, statistical adjustments, stratified analyses, there are things that could be done. Step seven, check if any of those adjustments were pre-specified in the registration and how the trial was sort of planned. Step eight, consider if the adjustment was enough, right? Sometimes you may end up with groups that are so imbalanced that trying to statistically adjust may not be enough. Again, a little bit of a subjective. Someone may say, oh, did you, I mean, why step seven, right? Do you really need to go back and check if this was registered? That's going to get us to the second part of this talk. Primary outcome hypothesis testing and trial registration, right? What are trials about, right? Are we doing randomized control trials? Because we want to know if injecting some PRP at the end of a meniscal repair is associated with a 3% better chance that it heals versus a 7% better chance that it heals? Or are we interested in doing the trial to know whether or not we should be injecting the PRP? In the vast majority of the trials we're doing, we're not interested in prediction, we're interested in inference. We're interested in taking that data and knowing should we be doing PRP injections at the end of the meniscal repair, yes or no? Okay? So it's about inference. And because it's about inference, hypothesis testing is paramount, right? Because we're going to test a hypothesis and we're going to reject it or not and that's the deal. So the decision to reject the hypothesis is what this whole thing hinges on. So you have to protect type one error. You have to protect the alpha. This is a, you know, Dr. Carey did a very nice job of discussing this in the last lecture, but this can get inflated very quickly. And we're going to go through some examples. All right. The not-so-primary outcome. Right? Everyone's like, oh, yeah, I know what the primary outcome of the study was, right? It was the COOS score. Okay, great. Was it the total score or was it the domain score? Was that specified? Right? Because each of these patient-reported outcome measures quickly devolves into six subscales. Or even something like, oh, we were interested in if patients had complications. Well, which complication, right? Like, the complication can easily devolve into six of its own. Time point. Okay, would you want to know about the COOS score at three months, two years? All of these are different, right? The two groups may do the same at three months and then differ at two years. Those are two different outcomes. Raw score versus a change score. Assume distribution or dichotomize, right? Would you want to know the absolute difference or do you want to know the proportion in each group who got a minimally clinically important difference? Those are different outcomes. Missing data and imputation, right? Did you just analyze the patients who actually had the data or did you analyze those patients plus you just made up some data for the others? Those are two different things. Alpha inflation multiple hypothesis test. This was already covered a little bit, so I'm not going to belabor this, but you only want tests, right? You have one primary outcome and then you get to test that once. And that's how you protect alpha. That's how you keep the type one error rate at .05. So what if you do adjusted versus unadjusted analyses? Okay, well now you've taken that same outcome and you've tested it twice. Or maybe more, right? If you adjust it in one of the models for two variables and the next model you adjust for five, now you've done it three times. Management of baseline scores, right? Did you then do another model where you incorporated those? You got to make sure it's consistent with the primary outcome and it's consistent with the power analysis, right? You don't want to do a power analysis for one outcome and then it's a different outcome for the primary. All right, so what are we not seeing, right? So we only in these studies, you only see what the authors report, right? But there could have been a bunch of other stuff that was done that behind the scenes inflated the alpha in a way you don't know, right? You think back to the database studies, right? You know, assuming everyone's being like completely forthright and they kept track of what they did, you know what they did. But you don't know how many other investigators have gone through those databases, right? Should we be, you know, you just don't know. You can only see what you, you can only know what you see here. All right, so the trial has to be registered. That's how you keep things transparent and that's how you keep track of what was really the primary outcome, how is it supposed to be tested and avoid the potential that things were tested a million times. Should be consistent, plant hypothesis consistent, power analysis consistent, right? Everything needs to be specific and everything needs to be consistent. You do that, you protect alpha. Inflation is everywhere as we all know too well. So I cannot answer this question, you know, how do we limit inflation without causing a recession, but how do we limit the inflation alpha? We can. Trial registration, critically important. You know, we're very lucky, you know, Dr. Ryder was one of the earliest people to sort of bring this topic to the forefront in sports medicine and orthopedics back in 2012. Wrote a very nice editorial on this and AJSM was one of the first journals to have a policy requiring trial registration. A few years later, some of the other journals followed suit. It is important that everyone does it, right, because if only one journal does it, then you can just take the non-registered trials to the other journals. So it is sort of a team effort here. But even with this, right, is it enough, right? Registering trials is great. This is a nice article from 2012. I think it remains relevant. I don't think there's been really any change in this data, sadly, but they looked back at basically randomized controlled trials that were registered and had been completed for at least three years, 40% had not been published yet, 80% deviated from the registration. So when they read the article, it didn't match up with how it was registered. I mean, this is just the reality. Like it is humbling and it's a little sad, but this is where we are in 2022. Can we do better? I don't know, maybe. You know, this is the part of the talk that's maybe a little bit controversial, but, you know, is anonymity a barrier to the review process, right? For randomized controlled trials, should we unblind reviewers so that they can check trial registrations and there's other sets of eyes looking at those things? I don't know, maybe. Who's, but that doesn't solve, who's checking for unpublished research, right? Who does that? Like no one really. And, you know, that's one of the challenging things in orthopedics, right? The FDA is not as heavily involved in some of our stuff, right? It's not like it's a new beta blocker and you can't prescribe the beta blocker until it's approved by the FDA, right? You know, so I'm not really sure how we solve that. All right, this gets to the next issue about spin, okay? Randomized controlled trials, like why are we using this, why are we using them, right? It's for inference. It's to compare two treatments, okay? You estimate a relative treatment effect. That's the whole point, is how did one group do relative to the other? It's not how each group did on their own, it's how the groups did relative to one another, all right? So, you've got to think about what was the comparison group, you know, there are a lot of randomized controlled trials done where there's really two experimental treatments and that's okay, but that trial is going to tell you how those treatments do relative to one another. They don't tell you really anything about how those treatments do relative to treatments that weren't in the trial, and that is an important distinction. And this is why we focused so much on balance and protecting the alpha, right, limiting the type one error rate. It's all about protecting this between group comparison. All right, you might say, yeah, but wait, isn't an RCT really actually just two simultaneous cohorts, right? Can't I interpret it that way if I want to? Maybe. I mean, yes, technically that is true, right? Like you did take some patients and then you did something and then you followed them, and so you could ignore the fact that it was an RCT. I would be cautious. RCTs often have stricter inclusion criteria, right? So this data may not be as generalizable as you thought it was going to be. Hawthorne effect, right? When people are in a trial, they're studied much more closely than they are in real life. Their treatment may be much more protocolized than it would be in real life, so it may not mimic cohort data as well. And then cost often limits follow-up, right? So these studies are going to usually be smaller and have more limited follow-up than some like cohort studies would have. So the best takeaway here is if you choose to go down this route of interpreting some of the patients from a randomized controlled trial as a cohort, this is no longer level one evidence, right? This is essentially a cohort, and in many cases, I would argue not a great cohort. All right, so haven't I and other people advocated that we should do a bunch of extra analyses with RCTs, right? This is confusing. You're making such a big deal out of one outcome, one hypothesis test, but then there's all these people talking about doing tons of extra analyses. It's true, right? You do want to make the most of the data that you can, okay? But you got to distinguish the primary report from all of the secondary reports. Only the primary report's level one. Everything else could be good, it could be useful, it could be hypothesis generating, but that is not the primary outcome. That is not the level one evidence emerging from that randomized controlled trial. All right, so this gets us into the last topic, machine learning algorithms and RCT data, right? This is a hot topic. Everyone's trying to figure out, well, when do we use machine learning? I'm not going to be able to answer that for you. You know, is this going to turn out to have been revolutionary for orthopedics? I do not know. You know, RCTs, high quality data in general perspective, usually has a clinically important outcome, right? Why wouldn't we apply machine learning algorithms? And there's probably going to be some benefits to doing that. I would just, I'm going to go over a few questions to consider, a few cautious. One is the sample size large enough, right? We talked about RCTs tend to be small. Like it's just the reality of doing them, they're expensive, right? And so they're designed to detect a difference in a primary outcome. They're designed for inference, not prediction, right? Machine learning algorithms are sort of the opposite, right? They capitalize on having an immense amount of data. And so it's not clear to me that the sample size in most RCTs is going to line up with what's necessary for these algorithms to work well. So I would just be cautious first on the sample size. Next, is the population heterogeneous enough, right? One of the ways randomized controlled trials become more efficient, limit costs, is by sort of strict inclusion and exclusion criteria trying to minimize the variability going in, okay? But then when you think about building a prediction model, well, you want a bunch of variation so that it can predict other variation. And if you didn't start with very much variation, it's going to be awfully tough to then use that data to develop some kind of really great prediction model. You know, and then if you, let's say you do, let's say you develop a model and it has pretty good predictive ability, you think, but it's from a randomized controlled trial. Well, if you then take that into the real world where the patients are a lot different, you might see that your model doesn't perform so well. So it would definitely be critical and would even more so want to see sort of external data used to assess the generalizability and the performance of those models. And then finally, this is unclear to me, what do you do about the treatment, right? Let's say the treatment in the trial worked, and then you're using this machine learning to build a predictive model, you know, do you only include the treated group? Also, what if in the real world, you know, selection bias to getting treated is actually an important predictor of the outcome? Like the RCT data is never going to be able to incorporate that because people got randomly selected for the treatment. They didn't sort of self-select into it with the help of their surgeon. And then, you know, treatment affects the outcome, you know, should models then be stratified by the treatment, you know, a lot that's unclear here. All right, so just to recap, assess balancing after randomization, do not use p-values, right? Randomized control trial, you're going to randomize patients into two treatment groups. Think about it, just make sure those groups look similar, look comparable after the randomization because it is not a guarantee that they will be, especially for smaller trials. Respect the primary outcome. You want to avoid inflation of alpha, avoid inflation of the type one error rate. You do that by being consistent and specific, right? The primary outcome should be as specific as possible. Focus on between group comparisons with RCTs, right? That is the point of the RCT. It's not to say, oh, both groups do great. You could do a cohort study for that and have more generalizable, you know, group of patients The point of the RCT is for between group differences. And then finally, you know, it's important that we make the most of the RCT data, but, you know, we should probably be careful, right? This talk did not discuss a lot of stuff, you know, different trial designs, right, non-inferiority trials. I would say this is not super common yet in orthopedics. Of those 86 studies we reviewed, 85 were superiority, sort of traditional RCTs. In the future, that may be a bigger issue, cluster randomized control trials, things like that. But we didn't discuss strategies for multiple primary outcomes. Dr. Carey had started to get into this a little bit, but, you know, FDA has issued guidance on this. Some pharmaceutical companies are doing this where you can actually set up the trial in advance to have multiple primary outcomes. That's not really happening yet in orthopedics, but may at some point. Treatment heterogeneity, right? Does the treatment work as well for men and women? Does it work as well for old and young? You know, we didn't talk about that, but that's an increasingly hot topic and important to consider. In the mediation analysis, like why do treatments work, you know, that's something that you can explore using RCT data. All right, thank you. please no so so the issue is right like what if you had 20 secondary outcomes and then if you go back to dr. Carrie's example there's a 64% chance that one of those 20 is going to be statistically significant well you can't yes so that you could but everything would have to be pre-specified and so what you can do is you can say and then I would say they're not secondary outcomes I would say that's an example where you have multiple primary outcomes and you can do that and you can and you can keep your alpha your type 1 error rate at 0.05 there's a couple ways to do it one so you can say listen we're gonna have two primary outcomes and we're gonna test each at 0.025 or you could say listen we're gonna have two primary outcomes we're gonna test outcome a first and then if and only if outcome a is statistically significant then we'll move on to outcome B it's like gatekeeping there there are novel techniques that are being that are being actively worked on to do that to try to like make the most of this data but if it wasn't pre-specified and somebody just said hey we've got this one primary outcome this is how we're gonna test it this is what we're doing and then we got all these other things the other things are not the type or not the level one evidence they're interesting their hypothesis generating their add-on you do for sure for sure you do want to pull as much out of as you can but and again you gotta remember what's the point of doing the trial right it's it's in my opinion largely for inference is you're trying to make a decision are we going to start injecting all the patients we do a meniscal repair with with PRP yes or no and so you got to make a decision and you know usually the way we set the trials up is we make the decision based on the primary outcome again I know that that seems like a pretty rigid stance a lot of this is you know based on in it maybe purposeful maybe often not purposeful malpractice or poor poor use of research in the past but you look at FDA right like these pharmaceutical companies and FDA they go back and forth like what's gonna be the outcome how are we what's gonna define whether this drug gets approved or not and then that's it and if those companies could then say oh yeah the primary one didn't work but look at all these secondary outcomes that were better we'd have a lot more drugs on the market yeah that's a great question though yes please yeah I would echo some of what dr. Kerry said right so there are certain things that aren't going to be studied with a randomized control trial right if an outcome is really rare like an infection after a procedure that was after a steroid injection like that's so rare how would we ever randomize enough people to study that you know also some things you're not going to want to randomize patients to and so I think those are can be well studied with a retrospective cohort but again I think you know some questions you're never gonna get true level one evidence for and the best you're gonna have is a retrospective cohort but if it is a question that could be studied and would be appropriate for an RCT then yeah I think you you're less enthusiastic about the retrospective study and you say hey yeah this is interesting this is the data that you would use to then justify the RCT but you know again there are examples of things where you're just not going to be able to study those prospectively yes sure yeah we registered yes all all of them are being checked internally against their trial registration and again and that's the other reason like I can assure like that even though that article is from 2012 it is still common that and again I I think in you know in in the medical world with beta blockers or whatever these statins and all this stuff like you know I think sometimes maybe people were purposely trying to like you know play fast and loose I think in orthopedics most the time when the report doesn't match up with the registration it's usually that like somebody's fellow five years ago registered the trial and then the new fellow wrote the study up and like at some point along the way people forget I don't think it's usually intentional but it's still important that we try to keep things as specific and consistent as we can yes please yeah so basically those are like so somebody does a randomized control trial and they're writing up the results and then in the table one of the study they'll say oh these were the characteristics of the group that got the PRP these are the characteristics of the group that didn't get the PRP and then here's a p-value comparing those characteristics between the groups that p-value is meaningless because what matters is what was the absolute difference between the groups not whether it was statistically significant because maybe it was a small trial and maybe the groups were really different but those differences didn't quite reach statistical significance they're still different the groups are still different the p-value is meaningless also we know for sure that the patients who got the PRP and the patients who didn't came from the same population because we randomized them from that population so the hypothesis testing the hypothesis itself doesn't really make sense yes it is definitely subjective unfortunately but I think you know one thing that is nice about orthopedics and a lot of clinical specialties is we have a sense or at least we think we do of what variables are predictive of the outcome so we can then look we should then be measuring those so that we can look to see if the groups are balanced and then because we know what we think is predictive of the outcome we can then it is subjective but it doesn't mean it's wrong right we can assess how big a deal it is right so let's say one group it turns out you randomize patients and one group is it ends up predominantly professional athletes and the other group ends up predominantly high school athletes well if it's you know return to sport I don't know maybe that matters a little bit right where is it it's some other outcome maybe it doesn't and so you again that is subjective but you still have to do it yes Ask them to remove the p-values from table one. Yeah, because, again, a lot of readers are going to know they should ignore those, but there are some who may not, and then they may be misled, and they may be misled into thinking, oh, the groups were the same because the p-values weren't statistically significant, when in fact, the groups were different. And so, yeah, just tell, you just say, hey, listen, doesn't really make sense why you have p-values in table one. I would comment, would recommend removing. Yes, please. No, related to that, if you have table one p-values, every now and then there'll be a study that actually does have a significant difference. I think that's really important to know. So that, to me, is a value of having p-values there. Yeah, I would just, I think the important thing was the absolute difference between the groups, and whether that absolute difference was, I don't think knowing it was statistically significant or not adds a lot beyond knowing the absolute difference. Because if the absolute difference is legitimate, and you think that the fact that these groups differed in that way could affect how their outcomes differ, then I don't see why knowing it's statistically significant or not would matter a lot. Yes, please. You said assessing the potentially imbalanced factor on its effect on the primary outcome, but then you also said that that's largely subjective. How do you assess that without a statistical test? And then doesn't that run into the same problems that you were just talking about earlier? Yeah, it is subjective for sure, right? And there's not going to be, you know, we randomize patients because we want the groups to be, you know, similar. But when you don't have an infinite population you're randomizing, the groups may not be. You know, you're going to get unlucky sometimes. And especially in smaller trials, there's a reasonable chance you may get unlucky. And then, yeah, each, you know, if it's a variable that doesn't affect the outcome, it doesn't matter if the groups differ. But the only way you can check for that is with a statistical test. No, no, no, no. You look at the absolute difference. So you say, listen, one of these groups just by chance the average age was 38. The other group the average age was 22. I think age affects this outcome of patellar redislocation. So therefore, I'm going to have to be critical in assessing the outcome because I think the groups weren't balanced. And I think that imbalance between the groups may be what's driving the difference in the outcome, not the treatment. But that sounds more like a hypothesis than a conclusion. Right? Like I think that the imbalance in the age group is affecting. Yeah, it is. It is. Yes. Yeah. Again, it is subjective. And again, that's why you check to see, you know, did the authors acknowledge this difference and what were their thoughts on it? Right? Did they then try to do adjustments for it? Right? Especially for a smaller trial. I mean, this is a little bit beyond this talk. But for smaller trials, it's increasingly done that the plan pre-specifies that you're going to include these variables. Also, if you know you're doing a small trial and you know the variable is strongly associated with the outcome, you need to stratify the randomization by the variable. And then you're guaranteed to get the group similar. So usually, this, in a well-done trial that was pretty well thought out, this is probably not going to be a big issue all the time. But it should still be checked for. Yes. In that example, if you found the difference, the one group was older than the other, could you encourage the authors then to stratify by age and their comparisons then? For sure. That's definitely a way that you could look at it. And it's optimal if that was pre-specified. Because if it's not, then you get into this issue of like, okay, well now we've tested one hypothesis three different times. So we actually have done three hypothesis tests. And that's where it can be helpful to pre-specify. And some trials will say, listen, we know there's a chance the groups are not going to be balanced. If the groups differ on one of these factors by more than a half a standard deviation, then we're going to do a model and include that variable in it. And that's how they keep the one outcome, one hypothesis test. Yes? So for reviewers who are trying to decide whether we reject a paper, revise it to be revised or accepted, and you see a paper that has a compelling outcome and there are randomization issues, and they address those randomization issues, but yet it's still not perfectly randomized or balanced, what would you say? Is that appropriate for publication in the HSS or elsewhere, or? I think it depends. But yeah, I mean, like, no study is going to be perfect, right? Like, you know, I think we want to recognize the imperfections in the study. But just because there are small imperfections doesn't mean something shouldn't be published. You know, but it does mean they should be noted, right? Because, you know, and then the authors have a chance in the discussion to mention it and say, hey, listen, you know, this is sort of associated with the outcome. It's not that strongly associated with the outcome. You know, we then did some secondary analyses that didn't really support things would have changed. And then, you know, you know, buyer beware. But if you don't point it out, then there's no chance to know. Awesome, thank you very much. Thank you, David. Both of these presentations have been great. You know, I started doing these 20 years ago because I realized there's so much that I didn't know and I wanted to learn more. And I realized that there's so much that I didn't know and I wanted to learn more. And I realized that there's so much that I didn't know and I wanted to learn more. And I thought, well, maybe our reviewers don't know more than me, so maybe they want to learn these things, too. And this is why we have these associate editors that amplify the knowledge that we have when we evaluate the papers. I see a number of our others here, Dr. Ganley, Dr. Fleming, Dr. Foster, I think Dr. Washer is around, too. And they all bring something special to the journal. And you do, too. So I want to thank you. So many of you are reviewers and you came here today and it's great to see this room full of people that want to learn to even be better. We have so many wonderful reviewers and yet you want to get better, which is fantastic. And we want to do whatever we can to help you. And that's why we have these seminars. And if there's some people that aren't reviewers and would like to be, we have a woman in front here who's wearing a red blouse who is known to many of our reviewers. We'll be happy to sign you up. So just talk to Donna afterwards, give her your card, and we'll get you reviewing as soon as possible. Before I go, I also want to point out that we're really honored to have the president of the American Academy of Orthopedic Surgeons, none other than Félix Baudis-Savoy here, which is really a special honor for us. Thank you, buddy, for coming. And thank you all for coming. Have a wonderful day. And if you're on the editorial board, we'll be regathering at 1.30 here.
Video Summary
The video begins by discussing the importance of evidence-based medicine and the role of randomized control trials (RCTs) in improving clinical care. However, it is emphasized that not all RCTs are equal and they must be critically evaluated to ensure accurate results. The speaker then delves into specific issues related to the review of RCTs, including balancing table one and the use of inappropriate p-values, assessing outcomes and hypothesis testing, trial registration, between-group comparisons, and the potential for spin. The importance of machine learning algorithms in analyzing RCT data is also discussed, but caution is advised regarding sample size, treatment heterogeneity, and the need to prioritize between-group comparisons. The speaker emphasizes the need for proper assessment of group balancing after randomization and stresses that p-values should not be used to determine group balance. Additionally, the importance of respecting the primary outcome and protecting alpha is highlighted. The speaker suggests that using machine learning algorithms with RCT data should be approached with caution and considerations of sample size and treatment heterogeneity. The need for trial registration and addressing potential biases is also discussed. The speaker concludes by acknowledging that while imperfect, RCT data should still be used to make the most of available evidence.
Asset Caption
David C. Landy, MD, PhD
Keywords
evidence-based medicine
randomized control trials
RCTs
clinical care
trial evaluation
machine learning algorithms
group balancing
×
Please select your language
1
English