false
Catalog
Machine Learning for the Experienced Orthopaedic R ...
Reviewing Machine Learning Manuscripts: Are They ...
Reviewing Machine Learning Manuscripts: Are They Really that Different?
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
You know, machine learning manuscripts, these are submitted. People are asked to peer-review them. A lot of times people may not feel comfortable, they may say, oh, I don't know a lot about machine learning, I may know a lot about the topic, but the machine learning sort of scares them off from doing the review. And so the hope here is to kind of talk enough about machine learning that maybe some folks feel a little more comfortable reviewing these types of manuscripts. And so the argument I'm going to make is they actually may not be as different as you might think they are at first. So disclosures, editorial board for a couple of journals, a committee member for a society, course instructor for a company. The real disclosure here is, you know, I still am learning about machine learning as well as everybody else. I don't think any of us know exactly how this is going to play out and when it's going to be super helpful clinically. So this is a little bit buyer beware here, like some of this is an oversimplification. Some of it may turn out to be wrong years from now, so just, you know, take it. All right, so this talk is going to be relatively basic. A brief introduction of machine learning, mainly just to make sure we're all on the same page here. I don't think there's any one great definition. What is research in general, right? I think it's important to think about why are people doing research and what are we looking at so that we can understand what methods may or may not be appropriate. Review of some basic math underlying regression, because I really think this is the build off to interpret what is machine learning and is it really that different from what we're already doing or what people feel comfortable with. Review some basic machine learning algorithms. Clearly we're not going to go over everything. And then get back to the basics and remember what's the goal of peer review and what are we really looking to accomplish. So first, what is machine learning? You know, there's no great definition. The first sort of accepted use of machine learning was actually to try to teach a computer to play checkers without giving it the rules. That's clearly not what we're interested in doing and not what people are using machine learning algorithms to do orthopedic research or doing. At a minimum though, it's a computer process for prediction. I think most of us could agree on that. And then it varies by field. Most consider regression, you know, linear regression, logistic regression to be a machine learning technique. For this talk we're not going to consider those to be machine learning. We're going to consider machine learning to be you're using a computer for prediction but you've sort of gone beyond more traditional regression. All right, what is not machine learning? So people like to talk about artificial intelligence in this nesting doll. So artificial intelligence is the big thing and then within it are a bunch of other things. Machine learning is one part of artificial intelligence. What we're not going to talk about is natural language processing. That is different. We're not going to talk about computer vision or visual AI. That is different. Now, can there be overlap in the algorithms used for these things? Yes. But machine learning is really about prediction. It's a part of AI. All right. So what is research, right? I would argue that really try to think about two different goals here, inference or prediction. Inference, you need confidence, right? You are looking to learn something. You want to understand if there is an underlying association. Hypothesis testing is super important. Prediction, you really want accuracy. The confidence is less important, right? The weatherman doesn't give you a confidence interval for the prediction of rain. All right. So here's an example, right? For inference. Posterior tibial slope and ACL injury. There's been a ton on this, right? Really emerged by an interest not about predicting the injury but about understanding the injury. Used to study then reconstruction failures, right? The idea was we understood a little bit of how the ACL was maybe functioning and what might predispose to injury. And then we went on to use that to sort of guide treatment, right? Hypothesis testing was very important early on for establishing whether or not that association really existed. And then based on those inferences, things were built up. It was never number one about prediction. It was about understanding. All right. Here's an example of prediction, right? Professional athlete injuries. It's not necessarily about understanding why they're injured or intervening on it. It's just purely, you know, can we predict who's going to get injured? And then, again, not a lot of interest in the factors, you know, and it could be used to make decisions but it's not used to build some kind of underlying theory that we're then going to build off of. You know, focus is on accuracy, right? Hypothesis testing is not that important. It's just can we have an accurate model? So this is where you think about machine learning maybe having some value. So okay. Now we're going to get back to a little bit of the basis of regression. I think most peer reviewers at this point are relatively comfortable with the idea they may review an article that includes regression. So, you know, what is regression? It's just an equation, right? It's an equation where you've picked a dependent variable, like maybe ACL injury, and then you've identified some factors that you think may be associated with the dependent variable, age, sex, tibial slope, for instance, and then you want to know, well, what is the influence of those independent variables, right? How does age, how does sex, how does tibial slope associate or predict, you know, the dependent variable, right? You then fit this equation that you made with data. And so, you know, you collect some outcome data, right? You get six people, you know, did they have an ACL injury? And then you know their age, their sex, and then you can measure their tibial slope. And then that is going to allow you to calculate these beta values. And then the question is, well, how do you do that, right? And I think this is where a lot of folks may start to be less comfortable. They may say, well, I don't exactly know how the computer does that or how that works. But people still are able to very adequately peer review articles that use regression, even if they may not know the nuance of the next step. Often, as a generalization, what you're doing is you're trying to reduce the error, right? You're trying to pick coefficients that make the error as little as possible. And the best way, I think, to understand what the error is, is it's the difference from reality to the prediction. So here on the right side, right, this model, the first patient, right, 19-year-old girl, 10 degrees of posterior tibial slope, the model might predict, wow, really high risk of an ACL injury, maybe 90%. And then she had it. So in that case, there's a 0.1 error, right? The model is not perfect. And so the goal of picking those coefficients is to reduce the error of the model. Now, how you do that, there's a lot of different ways. But again, most people, when they're peer reviewing articles, are not trying to say, oh, you used maximal likelihood or you used this or you used that. It doesn't inhibit people from peer reviewing, right? We all have a general sense of what's going on and some of the nuance of what was going on in the computer, people weren't super concerned about. All right. The one nice thing and the one argument I would try to leave you with here about the regression is it does pair up very nicely with inference and thinking about describing underlying associations. Because for each of the sort of risk factors or factors of interest, we get a coefficient and we can get a confidence interval. And that kind of aligns very nicely with hypothesis testing in the way we traditionally think about research. All right. Now we're going to move on to machine learning. All right. I want to put it in the context of regression, though. So let's say we were interested in age and tibial slope in predicting ACL injuries. So the red dots are patients who had an ACL injury. Green dots are patients who did not. All right. And let's say we did a logistic regression. This might be our logistic regression prediction line. Okay. The goal of this line is to minimize the distance from the points in their prediction, right? Minimize the error basically like we talked about in the last slide. All right. Support vector machines is a machine learning algorithm that differs slightly from logistic regression in that it wants to get the boundary as wide as possible between the points. It's less influenced basically by outliers because it's not as interested in the distance from all the points. It's more interested in the boundary that's created. But essentially, right, these are very similar concepts. Okay. But then you might say, well, wait, what if I've got more than two predictors, right, and I don't have a line separating it? It gets much more complicated, right? If you had three predictors, right, slope, age, and sex, well, now you have a plane that's trying to separate out the points. And it's easy to imagine if you had 10 dimensions, that would be some kind of crazy shape. And that's where you get into really needing a computer and takes lots of time and you need tons of data. Okay. But the underlying principle is still the same. But when you think about the output from this support vector machines, right, that's going to be really tough to take that plane and describe to somebody, well, what was the exact association between age and ACL injury, right? That's not going to line up well. But if you were thinking from a purely predictive standpoint, right, that orange line in the bar looks very sort of intuitive and a nice way to do that if you were purely interested in prediction. All right. So now let's do k-nearest neighbors, another commonly used machine learning algorithm. All right. So here, right, we have a new point and we want to know, well, how are we going to predict this new patient? Okay. Logistic regression would say, okay, they're above the logistic regression prediction line. It's probably going to be an injury. K-nearest neighbors, what it's going to do is it's going to say, okay, in this case it's a K of five. It's going to say, okay, let's draw a boundary around that new point that includes the next, the closest five observations, and then look at those five observations to predict what's going to happen. And it would classify it as green, right? You draw it around and then you've got three green, two reds. It says, okay. It's probably not going to be an injury. So here you could see, you could have a difference in the prediction. And again, maybe the k-nearest neighbors is better at doing this, but it's going to be purely about sort of prediction here. It's not going to be about describing the association between any factor and the outcome. This is also a nice chance to sort of talk about quote unquote hyperparameters, right? Much like with regression where you could sort of make some, even though we don't talk a lot about them in our method sections in the review, people do make some decisions, especially with multilevel models and things like that, about the variance structures and covariance structures. We don't pay a lot of attention to it. Same deal in machine learning, right? There are some hyperparameters that tell the computer what to do. And one of those is the K, right? Like I picked five, but what if we had gone on and said, well, I'm actually going to pick a different number. I don't want K to be five. I want K to be seven. Well, now it actually changes the outcome here. So instead of green, now the nearest seven could include four red and you go to red. And so there is, you know, there is some element where people need to do this appropriately. And then I think much like with regression, there's a question of, well, how are we peer reviewing that? Are we even peer reviewing that level of nuance? All right. So decision trees, this is a very commonly used, you know, this is pretty intuitive. This existed really before machine learning. And so basically you just take the factors and you figure out what's the way that sort of optimizes the prediction. So high tibial slope, yes, no, and then maybe you break it down by female. Then maybe you break it down by if they're young or not. And then that predicts if there's an injury or not. Okay. I think that is pretty intuitive to understand. So you may hear something about random forest, right? That is a machine learning, but basically all that is, is it's a bunch of decision trees put together. And in some of the decision trees, you forced the computer to start the tree with a different variable. And then it combines the trees together, hence it being called a random forest. And then that makes the prediction. But the underlying, the underlying structure is this. And then finally artificial neural networks. So this is sort of the most, you know, machine learning sort of type thing that would kind of scare people off. Basically you have variables, slope, sex, age. There's then a hidden layer. So basically some type of manipulation can be done to the variables to then create a new variable or node. And then something is done to that node to get to the output. So that hidden layer, you can actually have a bunch of layers within it. And when you get to five or 10 more, that's when people start talking about deep learning. It's not that anything magical happened. It's just you allowed the computer to have now a bunch more hidden layers. And then this is when you start to get into a little bit more like the visual AI stuff where you're trying to have a car break down, whether it's a sign or a scooter or whatever's going on. But ultimately, right, it's easy to see, you know, this is not going to be super helpful for inference, but this might be really good for prediction because it allows you to do a lot of things that you really couldn't do with normal regression. All right. So I know that that was sort of quite a bit. So some quick takeaways here. Machine learning algorithms I think are likely to have value, right? They appear to have some unique advantages for prediction, okay? You know, we didn't discuss all the algorithms, but clearly there is some potential there worth exploring. It's not that dissimilar from regression though. Like when you really think about it in theory, right, a lot of similarities here. And then some of the nuance that scares us off, if you think hard enough about regression, you could be scared off from that as well. But I'm not sure the nuance is really the most important aspect. And just remember the intent, right? With machine learning, it's really about predicting new data, okay? The goal is prediction. It's not inference. All right. So what about reviewing these manuscripts? So remember the basics, right? Do not get scared off because, oh, it's a machine learning. It's so foreign. I don't know what to do. You know, what are the inputs, right? What are the variables that the model possibly had to work with? You know, what is being predicted? How good is the prediction? And then will it be clinically useful? You know, if there are issues with the variables used, what was being predicted, whether the prediction was good, machine learning doesn't make up for that. It's not like it's magically better. Right, so the inputs, right? Ideally, you would have a large and diverse population, right, representation and size matters. When you think about building some of these crazy geometric shapes, you need a lot of points. And then ideally, you would use variables that are often measured, easily measured, reliably measured, and beware of unique or expensive variables. Why is this so important? Well, the goal here is prediction, and the goal is applying it to new data, okay? So if you've incorporated a variable that no one else measures, how is that possibly gonna be useful for new data? If no one else is measuring this variable, how are they then gonna apply your algorithm? Like, that's probably not gonna be useful. If it's such a rare variable, you maybe should've been in the inference camp, right? Maybe you should've been trying to better understand and prove that that's associated with outcomes and that's useful, so then other people are measuring it, so then it could be used in a machine learning algorithm. All right, predicting, right? What are you predicting, right? You wanna have an important outcome. Using machine learning algorithm to accurately predict something that doesn't matter, doesn't matter, right? Just because it was a machine learning algorithm that did it, if the outcome's not important, it doesn't matter. You know, it's not gonna fix issues with floor ceiling effects of prompts, right? Failure of the outcome to measure the truth is not a good outcome. The machine learning doesn't fix that. It just got really good at predicting a not good outcome, but it doesn't fix the outcome not being good, and if the outcome's not important, it's not important, and this is why I think it's so important for orthopedic surgeons and people in this room to be the ones who are helping with the peer review of these articles, because if we just send them all to statisticians, they are not necessarily gonna know what's the most important outcome for an ACL reconstruction. They're not gonna know the nuances of maybe there's gonna be floor ceiling effects with some of these prompts, or how people are using these in the clinic, or what input variables were good or not, right? Like this is, most of reviewing these articles is not about the machine learning. It's about the content expertise. All right, so good prediction, right? What measures of accuracy are provided? You know, was the data split into training and testing data sets? You know, you see this a lot, right? Machine learning beats logistic regression. That is not important. Like, that is only important if this is a statistics journal 10 years ago, and you're trying to prove that machine learning algorithms could be better than logistic regression. I don't think that's what anyone in this room, or any of the readers of our journal are that interested in reading, right? That's something that someone developing a new machine learning algorithm might be interested in, but not us. So the fact that a machine learning algorithm does or not perform better than logistic regression really does not matter. What matters is, was it good at predicting? Because if it was slightly better than logistic regression, but both were bad, both are still bad, and that is still probably not gonna be useful. So do not get like lulled into this idea of like, oh, because it outperformed the logistic regression, it must be good. No, if the AUC is still very low, area under curve, the C-statistic is low, it's still not good. So like, you know, here's an example, right? Like, and again, not to, if anyone, like there's an infinite number of these examples, I can promise you, right? You look at the accuracy, right? It's 60%. Area under the curve, 0.64, 0.65, 0.54, right? This is barely better than guessing. Barely better than guessing. But the takeaway from a lot of these articles is, oh, the random forest beat the logistic regression, 0.65 to 0.64. But they're all not good. None of these is gonna be clinically useful, right? So you don't need to understand anything about machine learning to understand that's not gonna be that useful. Now, it might be interesting if their inputs were really good to say, hey, this is something that with the inputs we currently have is not readily predictable. You know, there's still potentially value, but you don't need to know really anything about machine learning here to know that that's not gonna be super clinically helpful. Okay, we do external validation to know how good prediction really is, right? Like, just because something predicts really well, even if it predicts really well, like they split it up into training and then testing, even if it predicts really well in the testing dataset, still needs to be externally validated, right? The way the inputs were gathered in the next dataset may be different. The population may be different, right? If I develop a machine learning algorithm using 1,000 NBA players, and then I try to apply it to a bunch of high school kids, it may or may not perform as well. And if the goal is prediction with new data, you gotta get new data and see how it does. And I think that's something across our field and across most medical fields that we don't do well yet, right? Everyone publishes the paper, great, we publish the paper, and then you just sort of move on and forget. But really, machine learning is probably gonna be most helpful long-term when there's more data fed into it and the algorithm keeps getting better over time. And we don't really have a good way in our current research climate to make that happen. It's possible with registries, right, where there's continually data being fed in, that may be an area where machine learning really flourishes and there's someone paid to maintain the algorithm and keep it going. And it's not about, oh, I just published a paper, but it's about, oh, we're overseeing this algorithm because it's got some kind of real benefit. All right, clinically useful. Again, this is where orthopedic surgeons reviewing these papers, super helpful, right? Is prediction even valuable in this clinical situation? Predicting something that you can't intervene on is maybe not gonna be as helpful. Is accuracy sufficient for the context, right? There's some contexts where you, like taking somebody off life support, you probably wanna be pretty accurate, whereas some other things, you may be willing to sacrifice some. To what extent can it be validated? Is it practical to incorporate this tool clinically, right? If somebody has a machine learning algorithm that's got 30 variables and you've gotta hand enter them, or you've gotta get them from multiple different sources, you're probably not gonna use that. And then is there gonna be continued monitoring? Again, I think that's where we currently do very poorly. So clinically useful. And then, so when do you ask for help? Right, I think that's one question that I still have, and I think a lot of folks have, is like when you have, like this picture, right? It looks good, and then, oh no, like there's, you don't know what you don't know, and there's a lot more going on, and it can be challenging, I think, sometimes, to know when do you need to look for more. So do the results seem too good to be true? Right, if things are too good to be true, that would be, to me, a time where you might say, you know what, like maybe we need to look into how they split up the testing and training data, what were the hyperparameters, things like that. If you don't understand the basics, everyone in this room is a very smart person, right? If you don't understand the basics of what were the inputs, what was the outcome, like you should be, everyone should understand that, that's an author problem, and they should be able to explain that to you, because if they can't explain it to you, what reader is possibly gonna take this paper and make use of it? Like, that's just not gonna work. You know, question regarding the hyperparameters, if somehow you came across something and it seemed really weird, that would be a good reason, I think, to wanna reach out. And some of this is why validation's so important. Let's say somebody picked really bad hyperparameters, they did all this stuff all wrong, but it turns out the model is super predictive, even in new data sets. Well, actually, they probably didn't do it all wrong, maybe they got lucky, but if the model continues to perform well with new data, it's probably a good model. And I think, ultimately, that's why the validation in applying this in new data sets is gonna be so important. So just to recap here, I know we went over quite a bit. Research purpose matters, and it matters a lot, right? Is this a paper about inference? They're looking to better understand whether an association exists, or is this just purely about prediction? Machine learning has real promise, especially, right, for the prediction. In instances where we're purely interested in prediction, I think there's probably a lot of promise for machine learning. And then remember the basics. You know, poor prediction is poor prediction, and there's no such thing as magic. If something seems too good, I would ask for help. When meta-analysis first came out, there were people who would say, oh, even though the articles being meta-analyzed didn't sort of control for things, the meta-analysis is just gonna control for it. That's crazy, that can't happen. And so there's certain things with machine learning, right? If no one measured the outcome, it's probably not just gonna guess it. And then for those interested in more, I think there was a really nice Primer by Prim and their group published last year. Machine Learning for Absolute Beginners is a nice book. 100-Page Machine Learning book is a nice book. And then Mathematics for Machine Learning, these are all, I think, very good books that are, even with a limited math background, as long as you sort of have a tiny bit of an understanding of linear regression, you'll be able to make your way through these books. The Mathematics for Machine Learning doesn't have quite as good of pictures, but the other two have very nice pictures as well to help explain. All right, thank you, and any questions? Thank you. You're educating us, and I always wonder, you're trying to struggle. But they're allowed to change the variables. You can pick the variables I want. Yeah, that is a great question. There is not a great answer, it's a little bit machine learning algorithm specific. Some of them need a lot more observations than others. I would say in general, most of the data sets we're working with, which are 1,000 or 2,000 patients or less, are not really the ideal number for these algorithms to get optimally good. These are most helpful when you think about Amazon, websites, fraud detection, where there's like millions of logins to the website, and there's like all this random data that no one understands, like IP address, how quickly they click the second link, and things like that. So, unfortunately, there's not an exact minimum. As far as the variable selection, the variables in the... So you have a list of variables that you give to the model, and it should select out the ones that it sort of finds useful, and then any subsequent work with that model needs to have those variables. How do you know what variable is useful and which one's not? I think the nice thing about machine learning, that people would argue, is that you don't need to know which are good. You just give it all the information you have, and then it'll figure it out. It'll sort of sort it out. And that can also be, though, unfortunately, there's a limited amount of time here. That's probably, to me, the scariest part about the machine learning, is it may do things with the variables that we do not find equitable or that we would not want to do. But that's my point, is how do we get better at it? Because the machine's done it for us. That's exactly... So that's the deal, right? If your interest is better understanding and you want inference, this is not the way. This is not the way for inference. This is the way for pure prediction, right? If you want to understand people who are going to no-show your clinic, and it can include all this stuff like zip code, if they no-showed their rheumatologist three years ago, if they're late on their credit card payment, that might be an optimal way to predict no-shows. Okay, thanks. Yes, in front. Yeah, that is a very good question. I do not currently make a big deal out of that, because I think there's a lot of just interest in machine learning and people themselves learning about it and seeing what it can do and what it can't do and how it works. I think, though, the one time to be pretty critical is if folks are using machine learning and then they want to spend the whole paper talking about inference. Well, that doesn't make sense, right? If you were interested in inference, walk me through it. Walk me through the univariable to the multivariable and what you think is going on. And so that's where, yeah, I think it's fair and we should all be critical, is when the methods don't match the purpose. That's a great question. Thank you. I understand your perspective of this, in clinical realm if we don't, at the end of the day, understand what drives a certain methodology? Do you mind if I, I'm sure there's an answer. Yeah, yeah, no, that's a very good question. So, yes, it's a, If we were to take more questions. Yeah, so the question was, you know, can't we use machine learning, and correct me if I'm misparaphrasing the question. The question is basically, you know, there's more overlap between inference and prediction than the talk suggested, and might there still be a role to use machine learning techniques for inference? Is that fair? Okay, yeah, so I agree, talk oversimplification for sure. There is certainly a lot of overlap between inference and prediction. Just look at the logistic regression models, right? Traditionally, what we've been using to build prediction models, and are also pretty good for inference. Certainly, machine learning algorithms may figure out a way to detect subtle underlying associations we would have missed. And currently, and in the future, I think these algorithms are going to get better, or people are going to do a better job of figuring out ways for us to visualize what's happening with the algorithm, so that we could maybe try to work backwards in some instances to do inference, okay? You know, that's pretty far off. I would say, I think even if machine learning could never do inference, there will still be tremendous value, especially as we move more towards payers, and the government, and others being interested in our outcomes, and us having a need to adjust our case complexity, our patient populations, and things like that, where we really just have an interest in prediction, right? I'm not interested in modifying who I treat. I'm not interested in any of that. I'm not going to change my hospital. I just need Medicare or whoever to be able to sort of adjust for these factors that we're working with. I'm just interested in prediction only, and so if the machine learning model could do better, right, you know, you look at Europe, they're already trying to give individual surgeon scorecards and things like that. I think that's one area where even if there was never inference, there are these instances where prediction could still, and it's just prediction for prediction's sake, could still be helpful. Yes, please. So I have a smart anesthesiologist, so I've dabbled in this a little bit and published a few papers. To me, the value of this has always been not the paper, but the calculator, right? So a couple of years ago, Core instituted a policy where if you're publishing a machine learning paper, you have to have a freely available online calculator that actually allows you to predict it, and I thought that was really smart, but then I see these other papers where it's like a machine learning paper like describing this predictive model, and then there's no, like you can't use it, right? Because nobody, they didn't build an app, right? They didn't put a website up. Does AJSM or OJSM have any policy that requires that? Because without the predictive calculator, the paper itself is useless to guys like me who don't do math, right? I mean. Yeah, I mean, also like there's no, like yeah, without the calculator or a link to it, like no one, you can't, there's no equation, right? It's not like the logistic regression where you could build your own Excel. No, there's not, but I would say this. I think it's good that there was a policy. It's not enough, and I don't think any of us know what enough is, right? There are a lot of papers out there where you can click on the link, get to the calculator, try to validate the calculator, realize it doesn't validate that well, email the authors, and they say, oh yeah, like it didn't go that well, but we've moved on to a new calculator, but that new calculator is now proprietary, and we actually don't want to publish it, but if you want to use it, send us your data, and we'll run it for you and send you back the predictions, and you're like, well, how would that work, right? Like that's crazy, and so I think that is a big picture issue that is gonna take more than journals to get around and is one of the reasons I am probably more critical about machine learning being really all that helpful in this more traditional research sense and where I think it's gonna be most helpful if it somehow gets folded into registries or things like that where the point is not to publish a one-off paper, but the point is to have something that a group of surgeons finds useful and wants to get better and better over time. That is a very good question. Yes, please. So focusing on the predictions, so for some of the kind of questions or predictions, you can do it either with regression or you can do it with machine learning. Why is it that you require external validation for machine learning? No, we're not requiring external validation. I would just say that external validation is always important if you're trying to do prediction with logistic regression or not, but certainly for machine learning, especially where there may be more uncertainty around the hyperparameter selection in some of these issues, it might be very, very important. Because even with logistic regression or any traditional stats, the authors mention some of the details, some of the quotations, but not all of the details, how data was annotated or something. Yes. So it's not that regular that people ask for external validation on traditional stats. Yeah, no, and we're not, there's not like a two-tier process. It's the same. I would just, the point here was to emphasize the importance of external validation of these machine learning algorithms, especially given that they're solely focused on prediction. Though, completely agree. If somebody writes an article that uses logistic regression, solely interested in prediction, equally as important that that then be validated. Completely agree. Yes, please. Should the data always be divided into training and testing models? Yes, and there's a caveat. Always a caveat. Sometimes you don't have enough, and then there's ways to do sort of bootstrapping and stuff like that, where you can sort of get around doing that. But realistically, yes, there should always be some amount of the data saved for testing. And then, realistically though, the reason I probably wouldn't make as big a deal out of separating out the exact way, whether there was some kind of bootstrapping way of doing it or having a true testing, is there's gotta be new data, external data. Because if the goal was to create something purely for prediction, you gotta show that it can actually predict. And so I think, really, for these things to be relevant, and we're probably a ways away from having too many of these that are clinically relevant, that's what's gonna be required long-term. Quick question. What's your guys' gold standard for external validation? Is it a different demographic, different surge in different institutions? Yeah, I don't think there is a gold standard. There's so little external validation happening. I think any attempt at doing it is a great sort of first start. Getting back to your question about how are you looking at machine learning and whether it's appropriate or not or whether it was really needed, I think we're still in the phase where people are sort of learning, we're all sort of learning what can these things do, how can we use them, how do you get the apps installed on your computer and run them and get the data in the right format. And so I think just people attempting to externally validate stuff is super helpful. And there have been some of that. Cal Kunze has a very nice article where they took a calculator out of rush, I think predicting like FAI, sort of arthroscopy outcomes, and then replicated it at HSS, which was very nice. And I'm sure there's others, not to leave people out. Yes. Is temporal validation the same as external validation? Like if I take a calculator on like a five-year data set I did right at the 80-20 split, I wait two years, I get two more years' worth of data in the same population. Is that external validation or is that internal validation? Yeah, that's a good question. I think you could describe it as either, realistically, right? It's external in the sense these are people who were in no way in the first study. It's not as external as you would like though, because what you really want, it depends though, right? If you're only interested in the calculator being useful at your hospital, that's probably enough. Now, if you're interested in this being useful at other people's hospitals, well, yeah, you probably need to get data from other people's hospitals, because maybe the way they collect, I don't know, the range of motion or some type of thing is different. Yes, sir. Yeah, I may have misunderstood. I just want to clarify. Earlier, you implied that I think if you were doing the inbred study, you'd be interested in how you got from the univariate analysis to pick the variables for the multivariate, but in the prediction model, you wouldn't care about that. For me, I'm always interested in how they got to the variable in multiple predictions or machine learning, so can you clarify that for me? I don't know if I misunderstood. No, so I would say for a machine learning, right, again, this is maybe an oversimplification, but for machine learning, you just want as many variables as possible for input, irrespective of whether you know how they might be associated with the outcome or not. I mean, that's how machine learning really can be really advantageous is it kind of takes all this crazy data and things that we would not have thought could be associated with the outcome and can manipulate them, especially when you think about a deep learning and create hidden layers that then go on to predict the outcome, and so I think understanding the nuance of how that happened doesn't really matter as much if you're only interested in prediction. Now, there's potentially some social issues there, right? If the machine decides, oh, I can use zip code and socioeconomic status to predict race, and that's one of the hidden nodes, and then that predicts whether or not you get a treatment, well, I don't think any of us want that, and so that is one of the real dangers of machine learning is that we don't really care about the inputs, we just care about the prediction, whereas if you're interested in inference and really understanding, I think you do wanna sort of kind of walk through the story of understanding the univariable, and then if needed, multivariable associations and things like that, and there could be room for disagreement with that opinion, but that is just sort of how I kind of have seen things. Yes, sir. Could you explain a little more what's going on in those hidden layers? It sounds like a black box. It is literally described as a black box. If you look at any of the textbooks, it is a black box, and that is a little bit the concern. Essentially, though, mathematically what is happening is you're performing functions on the input, so you might be taking the cosine of one of the inputs and then merging it with the tangent of another one of them, and then that creates a hidden value, and then that hidden value gets combined with another hidden value, and again, this is where you start to leave what we defined as machine learning and head into more artificial intelligence and visual applications, but, and also, you need tremendous amounts of data, right, like tens of thousands, if not hundreds of thousands or millions of observations to get something like that to work and to really know confidently that, okay, this crazy thing is gonna work, because if you just have 1,000 people, right, and you've got 10 variables, you might be able to, if you allow those 10 variables to create enough permutations, perfectly predict to everybody, because you created 1,000 variables and you have 1,000 people. Yes. Oh yeah, so if two, yeah, they ran two different machine learning algorithms, they would get two different predictions, models, yeah. I'm sorry, I don't, I can't, I can't hear completely. Yeah, exactly. You may not understand what happened in the middle. I mean, with the decision tree, sometimes you can get a nice visual representation. But yeah, with most of the machine learning, I think people are still catching up with how can we visualize these things. So you won't. But it's really important to know, were those the inputs you would consider to be reasonable or is something known to be really important missing? And more importantly, is the outcome an important outcome? Because there's a fair number of papers out there that are predicting things that don't matter. And if you predict something really well that doesn't matter, it still doesn't matter. Yes. Are you guys at AGSM or JSM planning to implement a guideline for what needs to be included in the papers? Because I think other orthopedics and sports journals have more coverage of the AML type of work than AGSM. Yeah, there are some guidelines out there. And I think pretty much everyone's going to grab, much like with the quorum guide or now their PRISMA guidelines for meta-analysis, there do exist reporting guidelines for machine learning that people should adhere to. But then when you look back at the PRISMA stuff, the adherence to the guidelines is not great. The journals all say, oh, you got to do PRISMA. And then you look through the journals and you're like, oh, man, the compliance is pretty slow. It's like the clinical trials registry stuff, right? Like, yeah, you got to register, but who's checking the registration? But something simple similar to bullet points that you mentioned in your talk regarding what the outcome should be valid. It should be like the predictors should be something that we can collect? Yeah, that is not necessarily in the guideline. I think the guideline assumes that someone would create a model that predicts something important. And then, yeah, that is a good point. Yeah. Yes, sir. Yes, sir. So just in real time, we're looking at using machine learning with our NHL's water program. Because, you know, we have these, this war room in New York where they look and they say, well, it's the player with the shoulder to the head. And if they are, we have the mandatory look at them. But we're not very good at predicting them. But I just became sort of very familiar with machine learning because we started going back over our data. And then being able to say, like, okay, did it occur in the first period, second period, third period? Did this player have a concussion? How long since the concussion? What age? And again, we don't really know how important each one is, but we're pretty confident now. When we call the team, we can say, like, listen, you know, player 17 got hit with a shoulder to the head. And by our model, he has a 70% chance of having a concussion. Just to give the team some more data. But again, you go back and you're like, I think it's probably that they've had a previous concussion. But it's in that black box. And we can't really know. We're looking at starting that data. And we're looking at, you know, we're looking at, you know, we're looking at, you know,
Video Summary
The speaker discusses the use of machine learning in research and the potential benefits and challenges. They emphasize the importance of understanding the basics of machine learning and how it can be applied to various fields, including orthopedic research. They explain that machine learning is focused on prediction and can be used to improve accuracy in forecasting certain outcomes. The speaker also highlights the need for external validation to assess the performance of machine learning models on new datasets. Additionally, they mention the importance of incorporating clinical expertise when reviewing machine learning papers, as clinicians can provide valuable insights into the practicality and usefulness of the predictions. The speaker recommends using freely available online calculators to apply machine learning models to real-world scenarios. However, they note that there are still some challenges and limitations in the field, including the interpretability of models and the need for more validation studies. Overall, the speaker provides a comprehensive overview of machine learning and its application in research.
Asset Caption
David C. Landy, MD, PhD
Keywords
machine learning
research
benefits
challenges
prediction
orthopedic research
accuracy
forecasting
×
Please select your language
1
English