Brennan Borgestad: Thank you, everybody, and welcome. Thanks for joining the session today. Demystifying AI and Its Role in Admissions. Really excited that you're here. My name is Brennan Borgestad. I’m the Enterprise Account Executive with Students Select AI, and we've got some awesome speakers today. Excited to pass the mic over to them and have them share a little bit more on what's going on in AI this morning.
So first up, we've got Dr. Emily Campion. She's the assistant professor of management and entrepreneurship at the University of Iowa. And we also are going to be hearing a little bit later from Will Rose, the CTO of Students Select AI, and we're really excited, but I'm going to pass the mic over to Emily. And, Emily, take it away.
Emily Campion: Thank you, Brennan. I really appreciate it. Hi, everyone. I'm sorry I can't be there in person. I'm coming to you today from Coralville, Iowa, which is just a little suburb of Iowa City, if you're familiar at all. So, like, yeah, like Brennan said, I'm an assistant professor, a professor here at Iowa. Most of my research right now is on improving selection systems in organizations. I'm out of a business school at the Tippie College of Business and using natural language processing to do this. I've done research in other areas like daily energy fluctuations and nontraditional work arrangements and identity. But today, we're really focusing on using AI in higher ed admissions. I'm joined, as Brennan noted, by Will Rose, the chief tech officer at Student Select AI.
So I'm going to speak for about 30 minutes on using AI in admissions using examples both in admissions but also in hiring, because that's where most of my research is right now. And I think those areas speak to each other in a lot of convenient ways. Though the goals of those two processes are not entirely the same. There are a lot of — there is a lot of overlap, which is no surprise to any of you. So we'll follow with some examples. Will will bring some in from Student Select on measuring student characteristics to predict success — student success.
So as any good presentation has, we have our learning outcomes. So hopefully you'll by the end of this, you'll be able to describe what AI is and how using natural language processing or NLP can be used to actually score applicant attributes, identify how AI can be used to improve the admissions process, not just for students. We want student applicants to not hate the process of applying to our university, but we also want decision makers and staffers to experience the system a little bit more cleanly and maybe also hoping that you guys are able to link some of what you're learning in today's sessions to opportunities in your current admissions systems. And then finally, I think Will’s going to follow up here with really exploring how these non-cognitive, you know, human characteristics can really predict academic and also workplace success.
So talking a little bit about hiring and admissions, we'll cover a couple of topics here. We're going to talk about resource constraints, which should be no surprise to any of you in the audience that universities are facing some really tough resource constraints. We're going to talk about AI as one solution. You're going to hear me use a lot of hedging language, in part because I'm an academic and we often don't know things for sure. So I will be using some hedging language, just like I'm saying here, that AI is one solution. It is one helpful tool we can have in our tool belt. You're also going to hear a lot of the same things we all say about AI, like that's another tool in our tool belt today. We're also going to talk a little bit about what it is and how it — how it actually works. And this is going to be high level. I'm assuming some of you have some exposure. You've heard of course AI. You've heard ChatGPT. You've heard a lot of these things. My hope is we're going to make a little bit more sense of that today by imposing some structure on it. And then we'll talk about some use cases, things that are ongoing that I've been working on with my team, both successes and also works in progress. And then we are going to talk about addressing bias, because I think that that is one of the biggest outstanding questions people have in this area. And so we need to talk about it.
So when we think about higher ed admissions, again, none of this is going to come as a surprise to any of you, there are these resource constraints we're dealing with tighter budgets, and for some institutions, this was a problem well before the pandemic, that the pandemic, you know, exacerbated and made worse. I'm sure you guys are paying attention to how some schools are closing around the country. And so these tighter budgets mean fewer resources towards all sorts of things within higher ed, including, you know, admissions. Staffing. When don’t we have a staffing problem. I mean, whenever there's any shortage, that means the folks who are doing the work wear more hats and have more responsibilities. And in the in the world of admissions that means sometimes less time on each application and that has real consequences. And then finally, which I know likely many of you have been paying very close attention and it's sort of conveniently that the Supreme Court actually did rule, I think, yesterday on affirmative action in higher ed. A lot of institutions are now working with less information in some ways, it's because they've eased requirements for standardized test reporting. And we know there are some systematic well, I shouldn't say no. We have some evidence that there's systematic differences in reporting that are really meaningful for us to be aware of as we make some of these decisions. And then, as you know, institutions have less information as they're at the same time trying to improve representation and reduce barriers to education.
So you're facing all of these resource constraints in this really resource intensive process. And let's remind ourselves that this is a high stakes decision. This is something that can impact and a person's economic gain throughout their lifetime. And when I say economic gain, I don't mean just wealth for the sake of wealth. I mean generational wealth. I mean the ability to live daily and, you know, in a healthy way where they don't have to worry about, you know, finances either. You know, admissions is a high stakes decision. And so we really have to do it thoughtfully and it requires resources.
So like I mentioned, AI is what I am so excited about and why I love this topic so much is it's leveraging something we already have. I remember so fondly, and I do mean it when I say fondly, writing my app — my essay as a student when I was applying to colleges and I ended up going to Indiana University for my undergrad and loved every moment of it. But I remember writing, you know, my essay and putting all of this thought and effort into that. And so many students do that and we need to be leveraging that a little bit better. And when I say leveraging that better, I mean, again, it's a really resource intensive process to read through all of these essays and score them systematically. And that's a lot to put on our human raters. And so what's exciting about natural language processing is that we can actually leverage what's already there. That means you don't have to go out and collect more information on these student applicants if you don't want to, because a lot of it is stuff they're already producing and submitting.
So AI, and you've probably heard all sorts of definitions, but I think the one that is is most useful is the simplest one, right? We're scientists. We like to think in parsimoniates. AI is about computers imitating human behavior. That's it. And we typically are using something called “machine learning” when we're doing that. Now, for some of you, you may be more advanced in your knowledge. You may know all of these already, but for those of you who don't, let's talk a little bit about what that means.
So this is the primary way in which we do AI currently. This could change over time, of course. The goal of machine learning is for an algorithm to learn how a human decision is made and then replicate that over and over and over and over again. I hope the first thing you're thinking of — two thoughts are coming to your mind. One is “where does the human” which we’ll talk about the second. And the second thing you should be thinking is “they’re replicated over and over and over again, things change over time, so we should probably continue monitoring it.” And you should. You would be right, if that's what you're thinking. So we have been using basic algorithms for decades. So this is not new. I really want that to be made clear here. Some of the things we are doing are, while I shouldn't say super new, but they're becoming more commonplace and more accessible. You've probably heard the term “neural networks.” That is something that actually started in like the 1950s, but it's just becoming more accessible now and used more often. But regression, our beloved OLS regression, we've been using that for decades, but it feels new and it feels a little scary because we placed it now in this, you know, these methods that we trust in this broader framework that we feel like we don't trust or feel like we don't understand and we do understand it.
So as I mentioned already, these are high stakes decisions. And when that happens and when we're making those decisions, humans should be actively involved in monitoring. And that's this notion, in the computer science space, of human in the loop. What that means is broadly, you know, humans continue to be involved, so they're going to be involved in training the model against humans scores.
We'll talk a little bit about what that means a little later on, because that is no small part of this. They're going to apply a psychometric theory to make sure we are actually measuring what we think we are measuring. Are we measuring, you know, emotional intelligence, social skill, conscientiousness? Well, we can do analyses. We can do all of these things to better understand and support the fact that we believe we are actually measuring those constructs. And then in a lot of ways, a human is still very much responsible in helping the machine, not only learn, but helping make sense of what the machine is learning.
And that leads to this really important point of AI should be used to aid human decision making, not necessarily replace humans. And I think this cannot be understated. We are not yet at a place, I believe, where we should be totally replacing humans in these high stakes decisions. They should — this is again, another tool for individuals to use as they are making some of these high stakes decisions. So if you're in a place where you're speaking to a vendor and they're like, “Oh, we can replace people for you,” well, I don't know if we want to replace people. I think we want to free up resources or free up, you know, admissions officers time by doing some of this scoring, doing some of these things that are highly repetitive, and then they can use that information to make, you know, make really well informed decisions by all the information that we possibly have on applicants.
So when does AI really bring value. They're sort of two broad buckets, and I have to give credit to my friend and colleague Daniel Schmerling ,who's an Iowa psychologist and works in my space, although he's on the practice side, for helping me understand early on in my life of learning about machine learning about these two broader buckets.
And so the first is really when the tasks are really complex, and this is really important, when we're synthesizing information from diverse sources. Well, I have an I have a visual for for that in a moment. Now we think and it's totally fair that we think we're really good at decision making. We're really, really rational. We know how to weight things the same every time we make a decision. I hate to break it to you, but it's just not true. We would love to think we're like this, but we're really not as good at this as we think we are. But you know, what is? Algorithms are—is . Algorithms are really good at this, right? You give them information, you train it to do the same thing over and over again. It's giving the exact amount of time and it's giving the exact weight to everything, every bit of information, every single time. You know, humans, we suffer from resource demands, right? So at the end of the day, we might suffer from mood effects. We might suffer from whether or not the temperature is — when it's chilly, I don't work as well, right? Like we suffer from some of these things. And so we're not as great all the time at synthesizing systematically. We'd like to think we are, but that's okay because we have tools that can help us do this a little bit better.
The other bucket is when tasks are repetitive, they're performed frequently and they're time consuming. These are things like reading, reading student essays, which I know is meaningful. I teach undergrad right now exclusively. I'll be teaching a Ph.D. seminar next year, but I, I know that everyone complains about student writing, but I actually love it. They're really thinking and I want to — I want to read all of that and hear a lot of that and help them process concepts in class. And I imagine reading essays for admissions is also similarly very meaningful, but it is a very time consuming task that, you know, we could allocate to, you know, an algorithm to help us do that a little bit more quickly, which maybe, hopefully, the goal is it gives us more face time with our potential students.
So broadly, and it depends on when you talk to, there's also a couple other sort of sub dimensions, but these are three broad types of machine learning. You've got supervised machine learning. What this means is we already have a label. So imagine you've got thousands of data points, thousands of applications where you've already made a decision on that student, a potential student — student applicant I should say — so you could build a model to make that decision. Similarly, for future cohorts where you don't have those decisions marked by humans already. So that's supervised machine learning. You're essentially saying optimize the weights of all these variables to predict this outcome, and it'll do that time and time and time again.
Then there's unsupervised. This one's fun because what it essentially does is it looks for patterns in the data. Now you, as a human, then have to make sense of those patterns, but it's essentially identifying these optimal patterns in the data and saying here these things sort of hang together and these things sort of hang together. You can imagine, you know, student essays or comments or things like that could we can use unsupervised machine learning to understand what some of those patterns are and some of the themes that are coming out of your — of those text fields.
The last one is reinforcement learning and we all know about reinforcement learning now because these algorithms learn from feedback. So our, we'll say beloved, I know a lot of people have hesitation about ChatGPT and I think you know there's good reason to be hesitant about ChatGPT, but these are some algorithms that were learned — that were trained through this sort of continual feedback.
So let's talk just briefly using some visuals to understand what each type of machine learning is. So supervised machine learning is where you might have all these assessments on the left. So you've got scores from their assessments. Maybe that's their GMAT, maybe their GRE, maybe the SAT/ACT, whatever standardized tests or assessment they might have, and then their applicant scores, which we can do, you know, using natural language processing to sort of extract what they're talking about in their essays and then any other applicant information you might want to include. We then use that to predict this human decision. Essentially, the model looks like this at the end of the day, it’s overly simplified, but the algorithm that optimally combined these three groups of variables and then predicts and then optimally, excuse me, combines them and then predicts the score or decision, whatever that might be. And it's important here to know that this is not unseen data.
So you're training it first where you have the decision already and then you're applying the same model to data where you don't have the decision already. And that's the value of using, you know, machine learning where you don't have to spend all this time on human raters now reading, you know, through all of this information. They can be doing other things and using the information that comes out of these models to actually aid in their decision. Just because I put “predicted score decision” doesn't mean that's the final decision. Just sometimes what we have as an outcome variable to predict is the decision whether or not they were they were accepted. Not to say that you absolutely should follow that 100% each time.
So then unsupervised machine learning is where we might have these emerging patterns that come out of the student essays and where it identifies these groupings.And then you look at the comments, see, it's not necessarily the case and you're not going to read those student essays. You're not going to read the text data that they generate. It's just that this quickens the process. And you might find, hey, like in this program, whatever program that you work for in admissions, might say, wow, like these are sort of the three groupings that we're seeing. This can help inform recruitment. This can help inform how you're framing what your program can offer students as they're applying and considering you as they're, you know, they're home for their master's program for 2 to 3 years, however long your programs are. So it offers more than just helping predict those decisions, but actually can help you in other facets of all of the things that admissions folks have to do.
Now reinforcement learning, like I mentioned, we know this now, just like Skinner's pigeons, right, we can train models to behave in desirable ways through this reinforcement. So ChatGPT as you can see, is the is the logo here. It was trained to generate text data when commanded to do so by a user. So anyone working on higher ed or paying attention to the news at all knows that generative text models, such as ChatGPT, they're here to stay. But fortunately there's a number of ways where you guys — I don’t know if you can hear, we just have a huge thunderstorm rolling in and I apologize if that's loud where you are, it’s shaking my house — so fortunately, we can use some of these generative text models to our benefit, right, as decision makers. And one way we can do that is using it through recruitment chatbots. You can develop a chatbot using generative — you can develop a text model to use this as a chat bot. Again, let's think back to the one of those uses is this repetitive information. You know, students, bless their hearts, they ask the same thing over and over again, even if it's on a website, even if it's in my syllabus. So using for admissions especially if you've got thousands of students visiting your sites a day, having recruitment chatbots where they're asking these questions and having an automatic response might be really useful. I'm sure many of you have already had experiences using some of these. There's a variety of quality out there, but really well done ones are — it means that you're free to then spend your time engaging with students.
Maybe more meaningful ways than sending a link of where to actually apply to your, you know, your institution. The ones in color here are ones that I've been working on explicitly, and I'll talk about these use cases in a moment. So scoring applicant essays and other text data and then scoring video interviews.
So this are, these are in hiring.
But two projects my team and I have right now that we're really excited about is, especially the one on the left where we’re — we have more than 1800 applications to officer training school in the Air Force, and we've been able to uncover a bunch of human attributes like leadership or social skills and values and moral standards, things that would be very valuable to the military context. We were able to combine these first of all, those scores from their text data. When I say text data here, I mean they gave great detail on personal accomplishments, career achievements, past job duties. They had statements of objectives. They had all, you know, letters of recommendation. And we can actually — those independently predicted whether or not they were going to be hired. Actually, they're hiring rating. It's a continuous score binary here. But we were actually able to predict whether or not they were going to be accepted into that program. And their subsequent performance. That's just their text data. So when we're combining it with mental ability, which we know continues to be sort of the primary predictor of performance and other job knowledge assessments, we were able to then continue to predict higher ratings and performance.
And if this last point that I'm terribly excited about, because this has been an enduring challenge within organizations forever, which is, you know, how do we improve representation in our organizations? And what we found is by adding this additional information that had these, what Will’s later going to talk about, some of these non-cognitive characteristics that still are skills they're bringing to the table, we were able to reduce racial subgroup differences in their total score, which does have, you know, a positive influence on adverse impact. And by positive I mean reducing adverse impact in hiring. And so that's something we're finding is that the more information we're able to use, the more we're able to actually, we're hoping, you know, create more equity in our hiring systems. And we suspect the same could happen for admissions. And then this other example is with more than 70,000 applications to professional jobs, this one, what we love about this one is it really drives this point home of monitoring.
So to build a model and just leave it for a decade is, that's not, please don't do that. Like this is something these are living things. You've got to update it as you go, just like you update, you know, maybe your admissions criteria over time or you focus more on certain certain skills and abilities. Sometimes maybe you focus on others as trends evolve. Right? Maybe 20 years ago, we didn't care if you knew how to code, but we care now. So those things evolve as should your models. So in this one, we were able to score and find communication skills, interpersonal skills, leadership skills. Leadership comes up a lot. What shouldn't surprise any of us, you know, business skills and also job knowledge. This is one stage of a multi-stage process, but we continue to monitor this and actually we rebuilt the model recently to help improve its ability to predict. And so we consider this a really big success.
Now applications in higher ed, this is some of the stuff that my team and I are working on with Students Select AI and this is exciting because this is — we haven't often been able to work in the admissions context. As you guys know, faculty, like I'm not involved in admissions activity at all. Those things are often kept separate, right? Faculty. My job is to, you know, produce knowledge and make discoveries and publish and get that information into the hands of people who can use it. And that isn't always necessarily my admissions, which I know nothing about. So that's not an indictment in any way. I just I'm not involved in that at all.
But what was exciting here is we've been able to develop — deductively develop dictionaries, right? So remember, we're working in the natural language processing space, the tech space, not just quantified data or already quantified data like SAT scores, GMAT scores, and we're able to capture traits like the Big Five, right? So we know that that is: OCEAN. Openness to experience — now I have to go through it in my head — conscientiousness, extroversion, agreeableness and neuroticism. And then also competencies: leading and decision making. I should note, I know not everyone goes by OCEAN. Some people go by CANOE. I see you and I'll do that one next time just to keep things even. And then we tested this in 11 data sets. So if you guys know anything about hiring, we typically don't have 11 data sets. We typically have teeny data sets of like 200 people, if we’re lucky. But this we were able to actually test the validity. And by “validity” here, I mean, is it actually predicting things that matter? Is it predicting their applications or is it predicting their GPA? Is it predicting whether or not they were accepted into programs? And we did find throughout these 11 data sets, which included graduate schools in the health care space, military and professional job applications, it did predict these things. It did, what we call, you know, criterion-related validity, predicted things that we really cared about. And then finally, it actually is capturing things that we aren't already measuring.
And we know this because we can take the application scores and we can take the characteristics we're trying to score and see if it predicts beyond the application scores. And we are finding that to be the case. So it's gathering things that we're not already capturing systematically. Remember that “systematically” part is really important because that tends — we tend to find that it also improves fairness.
I'm aware we only have a couple of minutes. I'm getting — we're getting there. We're almost there. So moving on to bias. There are plenty of examples of concerning AI models we can spend an entire session and we wouldn't even be close to being done talking about that. But we — this is the greatest thing — is we can understand why it happens most of the time. And that means if we can understand why it happens, we can try to address it.
So when we think about bias in AI, these are things I really want everyone here to take away. First and foremost, and really totally is data quality. You guys have heard, I'm sure you've heard the adage of garbage in, garbage out people in, you know, data science/computer science who do this sort of work will tell you it's now “landfill in landfill out” because you've got so many data points. So if your data quality is poor, your models are going to be poor and you should not be using them. So what does that mean? If you're training to human judgment, you need to pay the most attention to how the human ratings are developed. Is it a single person who didn't use anchor rating scales and just said 1 to 5? That's not great. You want a panel of individuals, you want behaviorally anchored rating scales, you want high reliability. And frequent check-ins to ensure that individuals are making those same decisions over and over and over again. Next is eliminating data that are associated with protected class. This mostly comes from the hiring side, but again, with the recent Supreme Court ruling, it means that in higher ed will likely have to eliminate some of that information as well.
And then interrogating the results and continuing to monitor. That means look at your bivariate statistics. Look at the descriptive statistics. What's going on? You have to still be very familiar with your data to determine whether or not the data quality are huge. And you might think to yourself, walking away, “wow, she didn't really talk about bias all that much.” What I'm talking about here are the things that create these really problematic decisions, because at the end of the day, we do know there are some differences among genders. I use gender as an example because I'm a woman and there are some differences. I'm not as strong as I will likely never be as strong as the average man, not just because I'm five three on a good day and quite petite, but it’s not going to happen. We know that there are these differences and so we do sometimes expect differences. That doesn't mean bias has occurred necessarily. It doesn't mean discrimination has occurred. It means that it is actually accurately measuring what we expect it to measure. And so that's why I focus so much on data quality, because the likelihood that you're going to include measurement error, which is error you don't want that can produce bias, is reduced when you focus aggressively on data quality and interrogating the outcomes of the models.
So five questions you should ask any vendor as you are approached by them and in admissions. First, what data were your model trained on? And this is interesting to think about because a lot of these open source models are using open source data to train. So they're using Wikipedia, they're using news articles. And my question, which I don't have an empirical answer for, is are these the appropriate data to train a model on for higher ed decisions?
My domain expertise makes me think, no, it's not the same. These aren't necessarily first-person written about my experiences to showcase the characteristics I have to be successful in your institution. They're reporting news. They're reporting on, you know, a biography of another person. However, I'm really interested in a study that actually compares the use of different sources of data to train these sorts of models for an empirical answer.
And then the quality. Remember, garbage in, garbage out.
Third is, is it actually predicting meaningful outcomes? So it's really cool in and of itself. From a scientific perspective, we can capture conscientiousness, but it should be predicting performance. And if it's not predicting performance, first, are you measuring conscientiousness well enough? And second, what's the utility of your model if you're not actually capturing these things? So does it actually predict meaningful outcomes?
And of course, content validity? Are you measuring what you think you're measuring?
And then finally, are you aware of and how are you responding to ongoing legislation? This is a good question for everyone. In light of recent New York legislation with the use of automated employment decision tools, which is out of New York City, which doesn't necessarily affect admissions directly, but these things do crossover over time.
And so the big high level takeaway, then I’m going to get into the five smaller takeaways, is I comprises systems that extend human capability by sensing, comprehending, acting and learning. And this is really important. Again, I want to drive this point home of we're not trying to replace people. We're trying to provide additional tools to show information to help decision makers make better decisions more systematically and fairer decisions about that.
So the big takeaways, I believe, from this, I hope, are resource constraints require solution and AI is one possible solution. AI offers an opportunity to score text data which is costly and time consuming. We can use NLP, natural language processing, to uncover applicant traits. Will’s going to talk about that in a moment. And whether and how our models favor certain subgroups of something you can understand, identify and address.
And finally, AI in higher education, like any other system, requires monitoring.
Ooh. 2 minutes over. For an academic, that's not too bad. That's not too bad. So I'm going to stop sharing, and Will, I will hand it over to you.
Will Rose: Great. Thanks, Emily. And I will share my screen now. And Emily, can you confirm that you can see my screen?
Emily Campion: Yes. Looks good.
Will Rose: Great. Well, thanks so much. I'm realizing that the pressure's on now that I have to follow you. So hopefully this is, you know, I maintain the interest here. But don't worry. My section isn't that long. So thanks again for everyone joining. As I mentioned before, my name is Will Rose. I'm the chief technology officer at Talent Sele— I'm sorry — Student Select AI. Talent Select is our parent organization.
So just to kind of extend what Emily's been talking about and some of the specific areas that we've been working on and getting a lot of help from experts like Emily to validate what we're doing and in exploring new ways that we can leverage these technologies and the research behind it. So, you know, just to give a little context: our organization, we have a deep history on the hiring side. And, you know, one of the early on, one of the questions that was posed was, you know, are there areas with the deep history and research that are helpful and selection on the hiring side, can some of that be applied on the admissions side? The short answer to that is yes.
And I want to show you some examples of that. So the question we pose here on the screen is can psychometrics predict best fit applicants? So psychometrics has a long history on the hiring side. Decades. What we're doing is we you know, we're not inventing anything new around psychometrics in terms of the the science behind it. But what we're doing is we're applying AI to be able to measure and extract that in a reliable way.
So, you know, we've worked with, you know, several graduate programs across the country, across different disciplines to interview essays and interviews. And we are showing that there are clear differences between not only accepting the client applicants, but, you know, successful and unsuccessful students in these programs. So we're seeing clear evidence that these psychometrics are useful in terms of, you know, the admissions selection process, but also predicting several different types of outcomes.
So one thing and Emily brought this up and Emily did mention the recent Supreme Court decision, which, you know, is a topic of discussion. But, you know, many of us saw this coming for a while now. And a lot of programs that we work with have been preparing for this. But ultimately, you know, a lot of programs have been trying to find ways to make their admissions process more holistic to address that. And also, you know, finding ways to incorporate more unbiased, non-biased data points into their admission selection process. And this is one way programs are doing that. So Emily touched on the potential here for incorporating unbiased data into the models. And, you know, this is something that we're seeing as a big potential for programs to incorporate a lot of new data points that programs weren't factoring in before.
That is going to help that admissions, that holistic admissions procedure. So you know our insights provide justification to accept applicants who may fall short of traditional admissions requirements. So one thing that programs are doing when you're seeing applicants that are kind of on the border, maybe they're a little bit weaker on the test scoring in the GPAs, but they are scoring very strongly in some of these traits that correlate with success. This is providing justification to give a second look to these particular applicants and potentially give them an opportunity they wouldn't otherwise have to join the program.
So just a couple of key points here. The, you know, we're able to get these psychometric insights not by requiring your applicants to take a separate personality test or psychometric assessment. We're able to measure these traits just by analyzing the admissions essay or the admissions interview. So there's no extra step that's required. We're simply analyzing data that you're already collecting and in your current admissions process. And I think that's a very key point. We're not trying to complicate or, you know, add new things to your admissions process. You're already collecting this information. We can provide you these extra data points without any extra effort and essentially give you a lot more data to work with on making your admissions decisions.
And that wraps up my section. We're definitely more than happy to answer any questions you have on anything that Emily covered or any of the data that I shared with you today. Appreciate everyone's time.
Brennan Borgestad: Thank you, Emily and Will. I'm going to pass this microphone around. If anyone has any questions, I'll come run to you . I was a cross-country runner, but very much more out of shape. So I'll run as fast as I can to you, and we'll chase you down. I see a hand up. Perfect. Run.
Audience Member #1: Thank you for your time and information. Question around the predictions you know, of acceptance and performance. If someone's utilizing ChatGPT for their essays and everything they're submitting, how do you know that those are kind of authentic outcomes?
Emily Campion: I love this question, and I'm so glad that you asked that. So fortunately, there have been a number of open source — and when I say open source, I mean totally free to you — and validated models that can detect plagiarism. So some of what we already have, I don't know what Georgia, did you say Georgia State I think, uses. We — some schools use Turnitin. We have plagiarism checkers that we can certainly use and run through. I think and this is something I and I don't spend as much time on, which is why I didn't spend as much time on it here. But I think that will require some additional policy work from admissions to determine what are the courses of action once it's detected. Do you give them another chance? Should it be proctored? I would say I hope you give them another chance. Sometimes they need to learn the hard way. But I think there are plenty of tools out there that are absolutely free for institutions to use to determine plagiarism and whether or not it's a, whether or not ChatGPT could have been used. I don't know if that totally answered your question.
Brennan Borgestad: We're getting head nods. That's good. Yeah. Okay. Any other questions? Hands. Looking for hands. Oh, perfect.
Audience Member #2: I have a very elementary understanding of anything related to data or quant, and maybe under elementary, so. But I'm interested in the idea of if your institution has kind of a homegrown algorithm or a data model that they've been using to assist in the evaluation selection process, how might AI fit into that? Generally, and I don't know if this is a like dumb question that everyone knows the answer to, but how, just I guess generally how AI can fit in to homegrown data models and algorithms.
Emily Campion: So first of all, like I tell all my students, no dumb questions. This is a complicated topic. I've spent almost ten years working in this space and I'm like, every day I'm learning about a new algorithm, like “oh man, now I got to learn something else.” So first of all, no dumb questions in any space, especially at a conference. We’re supposed to be learning. I promise ten other people had that question. So your question around how AI can fit into sort of the homegrown algorithms. If you have these homegrown algorithms, it might be that you're already using it, right? So when I mentioned that OLS regression, and I'll break this down a little differently, if you already have some sort of algorithm you're using to combine candidate information in anyway, and then you're using that same algorithm every time, you're using machine learning. Which means you're using AI already.
There may be other ways you can use the various pieces of AI to, to improve the process. So if you're already using it in that system, in that way, your homegrown algorithm, as you mentioned, first of all, I hope you continue to monitor that. I suspect being [...] you are. I suspect everybody is. I shouldn't suspect that, but I'm hoping everybody is. But there may be other ways you can use it. So, for example, there's some, and the legality of this is a little bit squishy, but like you could maybe use AI to scrape social media data. The question then being: is social media a fair representation of someone's characteristics? I would argue maybe not, but maybe more so in recent generations where they're comfortable living their lives online, maybe more authentic representation.
Other ways are developing, like I mentioned this, you know, a chatbot to help with recruitment questions. So there are other ways that you can integrate it if you're already using these homegrown algorithms. I don't think — I wouldn't want to say, “Well, you should just replace it with a new model.” If you've got a model that's already predicting well, and you can explain largely what it's doing, I wouldn't say replace it. That's a cost you don't need to make at a time where we don't have a lot of resources.
Audience Member #2: Thank you.
Audience Member #3: From what I understand, AI is using a lot of historical data in their algorithms to make some predictions. And if you make some strategic shifts in terms of what you're looking for, how do you fact— how does that factor into an algorithm.
Emily Campion: You hit the nail on the head here. I think, you know, if you are making a strategic shift, consider how then you should be revising your model. These are, you know, sort of living, breathing things and — not really that makes it sound like I'm talking about like robots or Skynet or something. No, they're not that type of [...], but they should be updated. Right? So let's pretend you don't use any algorithms. Let's pretend you use totally rely on human readers. You have a small army of well-trained individuals who are going through every piece of applicant information all throughout the summer and fall. As soon as all this information is starting to be sent and submitted, I would say it's likely if you have a strategic shift and you're not using AI, you'll likely have to meet with your small army of raters and reorient them. Right? So, hey, we're no longer as concerned with this skill. We want this skill. So it's very similar. I think strategically when it comes to monitoring and updating, whether it's an AI model that you're using to aid in this decision or if you're relying totally on human raters. That should, that updating and monitoring should be happening regardless. It's just the time it takes to run a model is much less than the time it takes to employ, you know human raters to to score everything. So I hope that answers your question. I do. I hear you say historical data. I don't know if you want to go down the route of what problems that come with historical data.
But your question around “should we shift things and what does that mean for AI?” You should shift there just as you shift with human raters.
Audience Member #3: I'm sorry. When you say the issue with historical data, what were you referring to?
Emily Campion: Yeah, so often when we talk about historical data, I'm sure you guys have seen some headlines. When I mentioned some models that are sort of challenging, Often when we rely — and it's not, let me be clear, you guys are often using historical data. This can give you an opportunity to look at your historical data and say, “Are we selecting certain subgroups more and how is that happening and why is that happening?”
So that's the only thing I mentioned is sometimes we can see, as with any data, we can see subgroup differences that sort of emerge. The company we always throw under the bus, I have many friends at Amazon that I know are doing exceptional work, but the one we always talk about is how one of their models a few years ago was like just selecting men and it was because it was a STEM— it was a series of STEM positions that historically selected men. And so the model replicated that decision. And so they stopped using it. What they could have done, they were responding, I think, to a lot of backlash, which is, you know, it's a company they have to respond to backlash that way. What they could have done is looked at their model perhaps and said, why is this happening?
And then they did and they did find out some of it was occurring because any time sort of a gender identity when it had to do with, you know, women. So softball, if a person was identifying a women's college or something like that, it said, oh, we've never really hired someone who does that. So the model says, don't hire this person.
So that can be a problem with historical data if you're not monitoring. But, Rodney, I'm going to assume that you guys monitor very closely those decisions and whether or not there are subgroup differences there already. And so you guys would likely then not have a problem with that. But it's that data quality element I was talking about before.
Does that help?
Audience Member #3: [...]
Emily Campion: Oh, I didn't hear that, but I'm going to assume you said yes.
Audience Member #4: My questions for either Emily or Will, but I just was curious if you could comment on the use of AI, not necessarily with applicants, but inquiries kind of at the very top. So what's going on common practice in the industry right now for machine learning where we're maybe scrubbing data off LinkedIn or very limited information about job title, company location, and even also trying to predict intent to apply. So just curious if either of you could comment on that when someone's not quite at the applicant stage, how this technology could be used by universities and colleges like us?
Emily Campion: Will, do you want to take it?
Will Rose: Yeah. So there's some interesting work being done out in the marketplace. I know they're not us, but other vendors out there exploring how to take lots of different data points including what can be scraped in social media to predict things like, you know, who's actually going to enroll, which is helpful to, you know, to, you know, map out your, you know, your financial model for the the coming years.
So there's a lot of work being done in that area. It's, you know, simply what data is available out there. And what do you want to predict if you have the data and you have an idea of what you want to predict, you can, you know, in most cases build a machine learning model to try to predict that. And I do see a lot of activity in the area of, you know, not only and not just in emissions but on the hiring side too. Candidate sourcing for recruiters is a hot area. You know, not just posting a job on a job board, but how do you actually identify and proactively reach out to applicants that you want to apply?
And, you know, the same kind of logic translates into admissions where the type of applicants that you want to attract proactively and be able to predict, you know, will they actually apply, will they actually enroll? That is activity that's being done. And it's you know, it's not overly complicated training models that could be applied to do that. It's just really depending on the type of data out there and what's actually predictive of those outcomes.
Audience Member #4: Thank you.
Audience Member #5: So I have a question about, you know, the dataset that you use. If you use only, you know, the data that we could get from applications and you would definitely, you know, the result would be similar to what we have. I would say, you know, from our interview, an amazing evaluation. So is there actual other datasets that you could access and what is the current, you know, [...] from the government in U.S., for example, that you can use to help us to have a better decision. Thank you.
Emily Campion: Yeah. So I had mentioned this in one of my slides. I like this question because, you know, the more data we have to train, the more high-quality data we have to train, I caught myself, presumably the better model we will have. And by “better,” I, you know, I mean that in a number of ways more defensible. We know what's going on. We know why it's predicting. We know what constructs, what variables, what human attributes are being measured. I had mentioned this among these slides. There's a lot of open source data that's available, like Wikipedia. Like there's one that I've seen that's like just PDFs and PDFs of books on books, but like digitally. And so, you know, in using that to train, you know, that's okay to train what we'll call like a sentiment model that you're trying to see, like what the sentiment is of, of someone's tweet.
You can those types of models to do that. But I do have concerns about using those datasets as a way to train because they have different purposes. Right? So writing a news article, I used to be a journalist, writing a news article, I'm not writing first person. I'm reporting on an event and, you know, there might be some human characteristics in there we can capture and capitalize on, but likely it's not. It doesn't have the information necessary. Right? So if you think about what's the content of news articles versus the content of the data you're trying to use. So content of Wikipedia news articles are reports based biographies, whereas — and not first person, right? Wikipedia is not first person. And then you've got and then you've got personal essays that you're trying to use interview responses.
Great. I'm glad you brought that up. Transcripts of interviewer responses. Those have different constructs. And so if you're using data that aren't developed for the same purpose, I worry that you aren't actually going to be able to predict as well in your data because it won't capture everything. Does that make sense? So even though I can, I can completely empathize that you're like, we would love to build these models.
We just don't have a ton of data. What else can we use? You know, maybe there's an opportunity to work among institutions to share it or build a communal model or something — that sounds like that might not work because there's a competition. But nevertheless, I would be worried about using some of these datasets that were developed for completely separate purposes because I just don't think it has the information, which means it's going to miss — so let me bring this home — it's going to miss information that you're trying to capture from your student applicants. If you don't use data that's built for sort of that purpose. So it's better to use your historical data, you know, well cleaned with a good criterion, than to rely on these open source datasets. Even though you might see, oh, my gosh, these thousands of data points. I just don't know. But I don't know that empirically. I have not seen a study do that that is just built from my, you know, theoretical understanding of these things. I mean, you asked about law, too. I don't there's some stuff in the news right now about ChatGPT having used, that team, having used, you know, personal information, you know. That's not something I actually spend much time in, so I can't spend much time talking about the legality. Will, do you know anything about that?
Will Rose: Not to much that area specifically, unfortunately.
Emily Campion: Sorry. I hope that answered your question.
Brennan Borgestad: So any other questions? Any other hands? I think that's it and we're getting close to the time. Thank you, everyone, please. Yeah, a big thank you to Dr. Emily Campion and Will Rose.