Preview Mode Links will not work in preview mode

Welcome to Uncovering Hidden Risks, a broader set of podcasts focused on identifying the various risks organizations face as they navigate the internal and external requirements they must comply with.
 
We’ll take you through a journey on insider risks to uncover some of the hidden security threats that Microsoft and organizations across the world are facing.  We will bring to surface some best-in-class technology and processes to help you protect your organization and employees from risks from trusted insiders.  All in an open discussion with topnotch industry experts!

Learn all about Microsoft M365 Compliance solutions here. Stay up to date by following our Insider Risk blog here.

May 26, 2021

Oh my gosh
Oh my gosh, I’m dying.
Oh my gosh, I’m dying.  That’s so funny!

And in just three short lines our emotions boomeranged from intrigue, to panic, to intrigue again…and that illustrates the all-important concept of context!

In this episode of Uncovering Hidden Risks, Liz Willets and Christophe Fiessinger sit down with Senior Data Scientist, Christian Rudnick to discuss how Machine Learning and sentiment analysis are helping to unearth the newest variants of insider risks across peer networks, pictures and even global languages.

0:00

Welcome and recap of <Uncovering Hidden Risks #6: Cracking down on communication risks>

1:25

Meet our guest: Christian Rudnick, Senior Data Scientist, Microsoft Data Science and Research Team

2:00

Setting the story: Unpacking Machine Learning, sentiment analysis and the evolution of each

4:50

The canary in the coal mine: how machine learning detects unknown insider risks

9:35

Establishing intent: creating a machine learning model that understands the sentiment and intent of words

13:30

Steadying a moving target: how to improve your models and outcomes via feedback loops

19:00

A picture is worth a thousand words: how to prevent users from bypassing risk detection via Giphy’s and memes

23:30

Training for the future: the next big thing in machine learning, sentiment analysis and multi-language models

 

Liz Willets:

Hi everyone. Welcome back to our podcast series, Uncovering Hidden Risks. Um, our podcasts, where we cover insights from the latest in news and research through conversations with thought leaders in the insider risk space. My name is Liz Willets and I'm joined here today by my cohost Christophe Feissinger, um, to discuss some really interesting topics in the insider risks space. Um, so Christophe, um, you know, I know we spoke last week with Raman Kalyan and Talhah Mir, um, our crew from the insider risk space, just around, you know, insider risks that pose a threat to organizations, um, you know, all the various platforms, um, that bring in signals and indicators, um, and really what corporations need to think about when triaging or remediating some of those risks in their workflow. So I don't know about you, but I thought that was a pretty fascinating conversation.

Christophe Feissinger:

No, that was definitely top of mine and, and definitely an exciting topic to talk about that's rapidly evolving. So definitely something we're pretty passionate to talk about.

Liz Willets:

Awesome. And yeah, I, I know today I'm, I'm super excited, uh, about today's guests and just kind of uncovering, uh, more about insider risk from a machine learning and data science perspective. Um, so joining us is [Christian redneck 00:01:24], uh, senior data scientist on our security, uh, compliance and identity research team. So Christian welcome. Uh, why don't you-

Christian Redneck:

Thank you.

Liz Willets:

... uh, just tell us a little bit about yourself and how you came into your role at Microsoft?

Christian Redneck:

Uh, yeah. Hey, I'm Christian. Uh, I work in a compliance research team and while I just kinda slipped into it, uh, we used to be the compliance research and email security team, and then even security moved to another team. So we were all forced to the complaints role, uh, but at the end of the day, you know, it's just machine learning. So it's not much of a difference.

Liz Willets:

Awesome. And yeah, um, you know, I know machine learning and and sentiment analysis are big topics to unpack. Um, why don't you just tell us a little bit since you've worked so long in kinda the machine learning space around, you know, how, how that has changed over the years, um, as well as some of the newer trends that you're seeing related to machine learning and sentiment analysis?

Christian Redneck:

Yeah. In, in our space, the most significant progress that we've seen in the past year, was as moving towards more complex models. The more complex models and also more complex way of analyzing the task. So if you look at the models that were very common, about 10 years ago, they basically would just look at words, it's like, uh, a set of words. Uh, so the order of words don't matter at all and that's changed. The modern algorithms, they will look at sen- sentences as a secret before and they will actually think the order of the words into account when they run analysis. The size of models has also increased dramatically over the years. So for example, I mentioned earlier that I've worked the email security at the [monastery 00:03:04] that we had shipped. They were often in the magnitude of kilobytes versus like really modern techniques to analyze the pensive language. They use deep neural nets and the models they can be the sizes of various gigabytes.

Christophe Feissinger:

What's driving that evolution of the models. Uh, you know, I'm assuming a, a big challenges to, uh, or a big goal is to make those model better and better to really re- reduce the noise and things like false positives or, or misses. Is that what's driving some of those things?

Christian Redneck:

Yeah. So at the end of the day, you know, the model size of translates in the complexity. So you can think of, um, the smaller model is basically they have very levers on how to modify their decision. If you have a very large model, it will just have that many more levers. If you wanna capture the variation that you have in your data set, often you need a lot of these levers and new models provide them. It's not just that, uh, there's one thing I didn't mention explicitly, the newer models... So traditionally old models, they were trained on the relatively small set of data that's split into two parts, the positive set, the negative set. And basically the machinery model was kinda trying to draw a boundary between them.

            The more modern model affected rates factor different. Uh, we do something called pre-training, which means that we train a model on neutral data, which are neither positive, nor negative to just capture elements of language. So once the model is loaded up with like huge, huge amount of data, huge amount of this neutral data, then we start feeding into positives and negatives to draw the boundary, but it can use all this information that is gained from the general language to make that decision.

Liz Willets:

That's super interesting. Um, you know, when I think about technology and kind of leveraging, you know, the machine learning to get an early signal, um, you know, something like discovering a canary in a coal mine, um, you know, how do you go about, it sounds like we're feeding positives and negatives towards neutral data, but how do you go about finding like the unknown unknowns and, um, you know, maybe identify risks that you may or may not have been aware of previously, um, with these types of models?

Christian Redneck:

It, it's the, at the end of the day, it's the neutral. So the way you can see it as that is you feed it a few, say positives, um, known positives. And that gives you an idea of where, you know, we know that possible attacks are, but then what's happening is it's using all this language is learned from the neutral data to consider like, okay, w- we've had to state our point, but everything that is like semantically close to that is most likely also something that we wanna target. And, and that's really, that's really the recipe. I mean, th- th -the ML that we're using, it doesn't have magical capabilities. It can really detect patterns that we haven't had before. It, it's possible in other parts of the incident risk space, if you rely on anomaly detection. Um, so not only in tech, in some sense, anomaly detection is a, is not a negative approach.

            So now in our approach, we have the positives and that's our starting point and for the positives for trying to see how far we can generalize from those to, to, to get a wider scop. In, um, what I mentioned in, uh-

Christophe Feissinger:

Anomaly detection.

Christian Redneck:

... anomaly detection, thank you so much for Christophe. It, it's kind of the opposite. You're trying to learn from the negatives. You're trying to understand what the typical state of the, of the system is and everything which deviates from it is anomaly that you might wanna look into. So that has more abilities to detect things which are completely unknown.

Christophe Feissinger:

Yeah.

Liz Willets:

Love it. That's super talenting from both, both perspectives.

Christophe Feissinger:

That's, uh, I think, just to step back and, and to make, um, the audience appreciate, um, the complexity is, you know, a simple sentence. Like if I sent a, a team message to Liz and say, I will hurt you. Again, so first of all, there's no foul language. It's perfectly okay obviously, but the words that sentence targeted at someone else could mean a potential, uh, threats-

Christian Redneck:

Right.

Christophe Feissinger:

... um, or harassment. And so for the audience, the challenge here is not to detect every time the word, uh, hurt, because hurt could be, uh, using perfectly acceptable context, but here targeted at someone, uh, that set of words potentially could be a risk. And I think-

Christian Redneck:

Right.

Christophe Feissinger:

... that's the, that's the journey you've been on, uh, as well as, uh, the rest of the research team. And that's where you can just do look at single words, you get to look at a sentence, right Christian?

Christian Redneck:

Yes. That's exactly right. So older ML algorithms, they will just see the I, the will, and the hurt, kind of independently, and then do a best guess based on the presence of any of these words, more modern algorithms they will actually look at the sequence. I will hurt. They're perfectly capable of learning that the combination of these three words in that order is something that's relevant versus if they come in a different order or dependent of, uh, you know, in a different context, then it might not be possible. And let me pick up what Liz had mentioned earlier. So modern algorithms, if you train it or something like I will hurt you as a positive, it'll understand that there's a lot of words which are similar to hurt, which kind of have the same meaning. So it will also pick up on something like, I will kill you. Uh, I will crush you, even though you haven't fed those into the positive set.

Christophe Feissinger:

But that all falls into that kind of threat, which-

Christian Redneck:

Yes.

Christophe Feissinger:

... stepping back is a risk soon as someone starts using that language, maybe, maybe they are actually meaning those things and they're gonna escalate or transition to physical threat.

Christian Redneck:

That's a real possibility. Yes.

Christophe Feissinger:

Okay.

Liz Willets:

Definitely. Yeah. I think it's interesting, 'cause I kinda feel like where you're headed with this is that you can't just use keywords to detect harassment. You know, it's kind of like thinking about overall sentiment and, and tackling sentiment is not, um, you know, an easy thing to do, you know, looking at keywords, won't cut it. Um, and would love to get your perspective, Christian, you know, from an intelligence and modeling view around identifying that intent versus just the keyword level. Um, you know, how do you get a string of words together that might indicate that, uh, that someone's about to, you know, harm someone else?

Christian Redneck:

Yeah. So first of all, you're right. Keywords by themselves they're usually not sufficient to solve this problem. They are very narrow, very focused problems where keywords might get you a long way. Like say, if you care just about prof- let's take the example of profanities. You care, just the profanities. There's a lot of words that you can put in the keyword filter, where we're gonna do a fine job? And this classifier is actually gonna do quite well. You're gonna start seeing borderline cases where it's gonna fail. So, you know, there, there are some words that are profanities in one context, but there are perfectly normal words in another context. Um, I mean, I don't wanna use profanities, but most of you might know that a donkey has a synonym, which actually is a swear word.

            So if you including in your list, then obviously you will hit on the message every time that someone actually means to use the word that's if it was donkey, but from a profanity, you can get a long way. If you look at things like threat, it's pretty much what Christophe said earlier. Um, all three words I will hurt, uh, forwards. I will hurt you. Each of those words will appear most of the times in a perfectly, uh, normal context where no harassment or no threat that's present.

Christophe Feissinger:

Right.

Christian Redneck:

So you can put any of those into your keyword list. You can say, okay, I can evolve my model from a keyword list to a key phrase list. You can say, uh, I will actually take small phrases and put them into my list. So instead of just, will, or just hurt, you will put in, I will hurt you and I will kill you. But now the problem is that, you know, there's a lot of different ways in which you can combine seemingly in the normal words into a threat. And this is ext- incredibly hard to numerate all of them. And even if you were to numerate all of them, you know, the language of waltz, it, it might be something that is good today, but in maybe half a year, your list will, you know, will not update. If you have ML models, this problem gets solved in a very convenient way.

            So first of all, the model by default kinda understands variations of language due to this pre-training. So we'll already capture a lot of variations that correspond to one of your input examples. And second of all, it's relatively easy to retrain these models based on new information that's coming in. So if you install like say a feedback loop, you give customers the possibility of saying, okay, Hey, look, this is another example, uh, that I've found that I would like to target. It can very easily be incorporated to the model and then not only catch this, but a lot of additional variations of a new set, this stuff came up.

Christophe Feissinger:

Yeah, I think, yeah, I think, uh, the, I think what's important here is this is not a static, it's a moving target because like you say Christian, language evolves, you know, there's always a new generation, there's a new slang thanks to social media that spreads rapidly and new way to hurt or insult someone or to harass or whatever it is. Um, and it evolves. So I think it's, it's, you're right. That it's a moving target. So it's all about the learning part of machine learning to either, like you say, identify new part that didn't exist before because language evolve or dismissing what we call false positives. So if I'm a seller and say, I will kill it this quota, I mean, norm, I mean, like I'm gonna exceed my quota and maybe the model caught that and we need to say that's okay. That, that, that sentence I'm gonna kill my quota is okay. Uh, hurting someone else not. Okay.

Liz Willets:

Yeah. And I'd love to learn a little bit more, you mentioned this feedback loop kind of, can you tell us a little bit about behind the scenes, on what that looks like? You know, how, how you might see, uh, a model improve based on those, um, feedback points that, um, you know, end users might be giving to the model?

Christian Redneck:

Uh, I'll try my best (laughs). So like, you know, thinking about it being a lance and lance doesn't quite hit the target. If you feed it, if you feed it a new item back, it will move this lens slightly closer to the target. And if you keep doing it, it's gonna do that until it actually hits the target. And not just the target, once again, the ball can generalize, so it will hit everything that's kind of similar.

Christophe Feissinger:

Yeah. Just to add to that, I think, um, in addition to the model, again, you get, uh, listeners gotta remember that it's, it's an evolving target and that Christian say you're seated with data and we do our best to have representative data. But again, the world of languages is so fascinating because the permutations are infinite. You know, we haven't even talked about multi language support in globalization, but you can imagine that, uh, even in words, a lot of people might swap letters with, uh, symbols or, or just to try to get away with, with whatever, um, things are trying to do. But it's, you can, basically where the point is, the combinations are infinite.

            So the only way to, to tackle that is to continue to learn and evolve. And for us to learn, that's when we need a feedback, not just from, let's say one industry in one region, uh, but from all industries across the world, as much as, as a school district in the US has a manufacturing, a manufacturer in the UK or whatever. Um, so it's, it's definitely, uh, a fascinating field where w- you know, we can, we're continue invest.

Liz Willets:

Yeah.

Christophe Feissinger:

What do you think Christian.

Christian Redneck:

Yeah, no. I completely agree. And at the end of the day, the same image, so the difference is you have a target, which is moving, and you have your lens, which is kind of like trying to catch up to it. It's a bit of a curse of the mail that you always a bit behind. So you always have to rely on people giving you samples, which usually means it's violations, which have already occurred. But at the same time, the retraining cycles, they're, they're fairly short. So you can adapt quite quickly to do information and adjust to new items that you would like to catch with your model.

Christophe Feissinger:

Yeah. Is it, is it a good analogy Christian, to draw from things we do on the security front or a malware phishing or virus it's an evolving target?

Christian Redneck:

Oh, absolutely. Uh, [inaudible 00:16:22] the risks in cyber security or, yeah, the overlap is massive. If you think about it. I mean, the way I like to think about it is that security kind of deals with the external attackers versus inside the risks and do some insight internal attackers. So you can see that, that the overlap assists, you know, very big, almost everything we doing compliance, we do security is very similar way. So for example, we have a lot of ML models deployed into production. They get retrained on a regular basis with new data, but there's insecurity. You know, there's a lot of other features that you can use as attack vectors, and then we have a lot of models built around those.

Christophe Feissinger:

Christian, how about the, one topic that I think is also, we hear a lot is sure you get valid feedback, but valid feedback is bias and someone's trying to, instead of improving the detections, trying to take it, introduce bias, whether it's racial or, or sexual nature, whatever. H- h- h- how do you make sure you mitigate for that type of, I guess, junk feedback or bias feedback?

Christian Redneck:

Yeah. Junk feedback is, is indeed a problem. There, there's a few things that you can do. Uh, first of all, we don't usually accept feedback from everyone, but the feedback we accept is usually people from admins and admins, we know our understanding is that they have a certain amount of knowledge that they can use to get feedback.

Christophe Feissinger:

Hmm.

Christian Redneck:

And that's particularly true if they get the feedback you're looking on from end users. So we usually, they won't just blindly trust them, but, but they will look at it, at it, and then only if it's right-

Christophe Feissinger:

And [inaudible 00:17:57] trash.

Christian Redneck:

Right, tri- [inaudible 00:17:59] trash. Thank you. So that's one way, um, then generally we don't just, so we're not rebuilding the amount of the data and then just automatically pushing it. There's actually a whole system, which ensures that whatever new model we've built is better than the previous model. So if someone feeds in poor feedback, you would expect that the model it gets worse, does worse of the test set. And in that case, we would publish this model and just discard the feedback and move on. That might store that data that will slow down the process. But at the same time, it ensures that the models will degrade and actually get better.

Christophe Feissinger:

No. So again, do you think saying, we do have a rigorous process to make sure that-

Christian Redneck:

Yes.

Christophe Feissinger:

... a blind, doesn't blindly me, uh, role in production versus the quality along the way to make sure it's converging not diverging.

Christian Redneck:

Yes.

Liz Willets:

Definitely. Yeah. And I think having those responsible AI and ML practices is again, to your point earlier, Christophe, something that's always top of mind for us, anything concerning privacy (laughs), uh, really in this day and age. Um, but to kinda just change gears a little bit here. Um, last week, when we spoke with [Grumman Tolobby 00:19:07], we got into the conversation around like GIPHYs and Memes et cetera. Um, and you know, thinking about how we can prevent users from trying to bypass detection, um, whether it's putting inappropriate language into images, um, and you know, trying to think about how you might extract that text from images. Um, we'd love to hear if you can talk a little bit to, to that side of things.

Christian Redneck:

Yeah. Um, I'm actually not an expert in the area, but, uh, image recognition is, is in general variable theory. It's actually a lot more involved than, than text processing. Almost everything we have done text processing we kinda stole from the people that have previously done an image processing. Like for example, the pre-training that I, that I mentioned earlier and in particular of their excellent bottles which, uh, can extract text from images. So I, I don't know what Microsoft version is called, but it is very, very good. You can almost be guaranteed that if you have an image, we can extract the text that appears at the image, and then just process it through our regular, uh, channels.

            So that's regarding texts and images. If it comes to images theselves and that's something that actually our team doesn't do directly, but there are lots of models which, uh, target, let's say problematic images. So what I've mostly seen is detection of adult images and gory images.

Christophe Feissinger:

Yes.

Christian Redneck:

And usually these classifiers, they actually operate in almost the same way as to [inaudible 00:20:47] I mentioned earlier. They start, so they're usually very big models. They start by pre-training them on just any kind of images. So they use these huge collection of public images to train the model and just kinda learns patterns. And in this case, you know, patterns are literally like visual patterns, they'll understand round shapes, square shapes. It will understand, it will have a vague understanding of the shape of a human than all sorts of different configurations. And, you know, of course, it can also understand the different color shadings. So models like that, they'll probably learn that if you have, uh, from human shaped with a lot of red on it, then it's probably more likely that there's, that you've already image as opposed to a promoter human with a lot of purple on it or a green on it.

Liz Willets:

That just kind of reminded me of something, you know, when, when you see those images and you're extracting that text, we're also still able to provide that feedback loop. Um, because I do remember we had one case where, you know, we were working with this school district and they all of a sudden started seeing a lot of homework assignments, um, being flagged for gory images. And it came down to the fact that the teacher was using red pen to kind of, you know-

Christian Redneck:

Yes.

Liz Willets:

... mark up the student's test or quiz-

Christophe Feissinger:

Yeah.

Liz Willets:

... or whatnot. And so there's always, you know, that feedback loop top of mind.

Christophe Feissinger:

Yes.

Christian Redneck:

Yeah. I think that ties back to, I think, to, uh, exactly what Christian was saying that obviously with a pandemic now, everything is online and doing annotation of maths exercise with a red pen. Uh, I guess the initial training set didn't take into account that type of data like in a school district, uh, using modern digital tool to do math assignments. And so that's a perfect case that, yeah, it detected those as potentially gory because it was a lot of red inking on a, on a white background with formulas. Uh, and, but again, it gets back to what Christine was talking about. Then we pass that feedback. So pretty much like we, the text detection need to evolve that image detection of what, what is defined as gory needs to ignore forming us with red annotation and start to be a little more, to be refined to avoid that in the future, because that's what we would consider a false positive. So it equally applies that any model, whether it's text or image, there is always that virtual cycle of, of constantly learning new patterns. And this one, that's a good example of a use case that we miss when we build those models.

Liz Willets:

Christian, um, you know, I'm just certainly learning a lot today (laughs), um, through this conversation. Um, but love to learn what's next. Um, you know, whether that's in your role or, um, just regard to machine learning and, and sentiment analysis. Um, but what do you think kinda the next big thing will be?

Christian Redneck:

That's a very good question (laughs). So, uh, from our perspective, our main effort is to get other features into the system, even when it comes to text processing. So as you mentioned earlier in, um, security, we have a much richer set of features that we've been using for quite a while now. We wanna do the same journey of our text models. So if you look at the communication, for example, you can induce, uh, whether it's falls under, it, it should hit on a certain policy or not, but you actually get more powerful models if you not just look at that one message, but that the entire conversation, or at least, um, you know, like the conversation, which is near, or, your target message. Like for example, the language that is acceptable between the students and the language that's acceptable between the student and teacher put different, it might not necessarily be the same. So there's a very rich set of, um, possibility that arise from looking at all of these metadata surrounding a message.

Christophe Feissinger:

Yeah. I mean, it's, it's, I'm glad you mentioned that of getting more context, because we did have a, uh, um, an example from the school district where, um, a student at [St. Litery 00:24:53] something like I will kill you in, in teams. And that was detected. Then the next question was what was the context around that? And sure enough, uh, the context was two students playing a video game. Um, so suddenly I went from a high alert, you know, the student is gonna-

Christian Redneck:

Yeah.

Christophe Feissinger:

... hurt this other student, whereas no, they're just having fun. So I definitely second that they're just adding the couple messages above and before that-

Christian Redneck:

Right.

Christophe Feissinger:

... you see that they're just playing a video game. And even though that language might not be acceptable, it's definitely not as bad as, uh, that intent to hurt someone. It was, I don't wanna hurt that virtual character in the video games. So yeah, definitely, uh, second down more context will definitely help really decide if this is really a, a high severity and more important what to do next in terms of remediation and cursing, the one thing I wanted to, we didn't really talk briefly, but we know that angu- language is not just US English. What are we doing to, to cater to other languages that our customers speak worldwide?

Christian Redneck:

Right. So we started all our efforts in English, but we're currently working on globalizing our model, which means that we want to provide the same protections for users in lots of other languages. We have like three tiers of languages and we're currently very focused in the first year, but eventually what we plan to get to all three tiers. And in principle, you have two ways of approaching this problem. The simplest thing you can do is you can basically build one model per language and that's something which works reasonably well. But in principle, what we aim for is models, which can deal with all languages at once. So there's been a lot of research in this area they're called multi-language models. They used the very same techniques that you use for, um, that you use for just English [inaudible 00:26:58] but then they have a few additions that make it suitable for applying it in a context with a lot of languages.

            And basically what it's trying to do is so there's very powerful models which can use, which you can translate from one language to another. And if more than a few of the ideas from these models and incorporated them, which enables the model to basically in some sense, like relate all the languages, uh, to each other at once. So these models they will understand, I mean, I understand in a, in a machine learning way of thinking about it, that one word in Eng- one word English, as long as its translation into Greek or the Spanish or the French that they all kind of are, are the same. And, and then this provides that opportunity. So particularly, it means that you can train models in, uh, say like a set of languages and you'll actually get decent performance in the other languages, even though it might have not seen these samples are generally very few samples from this other language.

Liz Willets:

Uh, the more-

Christophe Feissinger:

That's great.

Liz Willets:

... the more and more I listen, the more complex it gets, you know, you're using machine learning to, you know, look at different languages, uh, text versus images, ingesting things from different platforms. It's just mind boggling (laughs), how much goes into this, um, and really wanted to thank you, Christian for taking the time to, to chat with us today. I don't know about you Christophe, but I learned a lot.

Christophe Feissinger:

Fascinating. Fascinating.

Liz Willets:

Awesome. Yes. Well, thank you so much, Christian. And, um, thank you to our listeners. Um, we have a exciting lineup of, um, podcast series coming your way. Uh, next time we'll be talking to Kathleen Carley, who's a professor in social behavior analysis at Carnegie Mellon University. So, um, definitely tune in.