AIAW Podcast

E177 - AI Quality and Security - Magnus Hyttsten

Hyperight Season 12 Episode 5

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 2:09:55

In Episode 177 of the AIAW Podcast, we sit down with Magnus Hyttsten, former Google AI Engineering Lead and EU AI Act specialist, for a grounded and timely conversation on AI quality, security, and compliance at enterprise scale.

 
Drawing on his experience building generative AI evaluation systems at Google, Magnus breaks down the real challenge of defining “quality” in non-deterministic models and explains why robust evaluation frameworks, engineering discipline, and governance are becoming mission-critical. 

We explore the EU AI Act as a potential accelerator for trustworthy innovation rather than a constraint, unpack common security blind spots in enterprise AI strategies, and look ahead to what responsible AGI development might require.

If you’re serious about moving from AI experimentation to secure, production-ready systems, this episode is essential listening. 

Follow us on youtube: https://www.youtube.com/@aiawpodcast

Magnus Hyttsten

You're expected to work, or that's the expectation, to work an infinite amount of time almost, or work as much as you possibly can, like the nine nine six um working schedule.

Anders Arpteg

Yeah, from nine o'clock to nine o'clock in the evening. Yes. Six days in a week.

Magnus Hyttsten

So that's basically Monday through Saturday, 9 a.m. in the morning to 9 p.m. in the evening. What do you do after that? You go home, um, maybe ramp down a little bit, go to bed, and then you have to get up and work again six days a week. So that's 72 hours, right? And even 72 hours, um I mean, many people work more than that, especially in the startup, um, in the startup space.

Henrik Göthberg

But you said something that I think is maybe one thing. In in in Sweden or in Europe, we we are maybe with our old engineering culture, it's it's not only that we work, but we have a more structured view of from A to B and how that should play out so we are effective and get maybe impact in those hours. Whereas Americans, there the work culture is more result-based. So it's like so we we are have this target, we have this delivery. I don't care how you do it, you you work night 24-7 towards the results. So there's a distinction here a little bit how you view your work.

Magnus Hyttsten

I think so, but I think the most important thing in Europe is that by just saying living in the culture where the normal work is eight to five, you kind of get to cram in all your responsibility within that box to the maximum extent possible. Whereas in a startup environment in Silicon Valley, um, there's no eight to five. There is whatever. So therefore, you may slack around a little bit more during the day and then work evening times instead. So that discipline isn't really there.

Henrik Göthberg

You said something that is like so having experience both, you're not necessarily getting more work done, those 60% to those from hour 60 to 80, so to speak. So in a way, if you structure yourself disciplined in those normal hours, uh the actual quality of work and outcome is not that different.

Magnus Hyttsten

No, no, I I defin I personally I definitely think so. I mean, 40-hour weeks, if you have a thinking job, uh, that's still a lot of mental load that you're putting uh into the work that you're doing, right? If you ramp that up to 60 hours, um you you're working quite a lot. Six beyond 60, I think you're definitely hitting a huge um point of of no benefit. Um diminishing returns.

Office, Remote, And Culture Shifts

Anders Arpteg

Yeah. What about working in the office or from home? What what what do you think the culture there is?

Magnus Hyttsten

Oh, everything changed after COVID, all right. Um and um uh I mean everyone is now trying to get people back uh into the office. As if we look at just the efficiency of work, I think that greatly depends uh on the person that you're talking to. Yeah. Some persons are very disciplined uh and they can do work at home. When I studied in in the university, for example, I couldn't sit in my student dorm and do work. I had to go to the library and sit because there I was disciplined.

Anders Arpteg

Uh it probably depends how productive you can be in working from home. But in yeah, in Sweden it's still kind of open to work from home.

Henrik Göthberg

I know Spotify, for example, have the work from anywhere kind of policy, but they're be they and they've been quite vocal about that also as a contrast, especially in the US market, uh, to make that almost like a selling point.

Magnus Hyttsten

Yeah. But then there is also, I mean, there's a lot of value actually meeting people, uh talk to people. So um fully working remote on the other side uh creates isolation uh and all kinds of problems associated with that coordination, getting to know the other people socially.

Anders Arpteg

Yeah, um, Teams meeting or Google Meets meeting compared to a physical meeting is very different.

Startup Grit And Broad Roles

Henrik Göthberg

Yes. And and you, of course, have both worked in a startup environment, both here in Stockholm and then in the US, and then from a startup environment into a Google environment. So if you contrast a little bit the startup culture with sort of the tech giant culture and Sweden, can we can we sort of because in my opinion, even if you go to a startup in Stockholm, you you get that sort of Luma hero approach.

Can AI Shorten The Workweek

Magnus Hyttsten

Yeah, yeah, yeah. I mean the the hero work, right? Yeah. Um yeah, sure. I mean, when when I was running the startup, um sometimes we slept under the desk. Um and the thing with startups is that everyone has such a broad spectrum of responsibilities, yeah, in particular founders and or the leadership, where you need to be able to hire people, do career management, customer acquisition, financing, all of these things combined in your workday.

Anders Arpteg

If you look a bit further ahead and take like software engineers, for example, you know, some people are claiming oh, AI will take all the jobs, but couldn't it be the case that we just reduce the workload that we have to have as humans and perhaps we don't need to work 996 or perhaps even just work 20 hours a week?

Magnus Hyttsten

It's a combination, right? Uh it's definitely a combination. If you look at the spreadsheet when that was introduced back in the what 1990s, early 1990s. Yeah. Um, I mean, what did we do before then? It was a lot of manual labor to do what spreadsheets were able to do uh going into the 90s. And the same thing with AI, right? There are certain things that AI will be able to automize that was not possible at all uh before. So there is there is going to be all variations to this team. I think from a personal perspective, the thing that AI creates that we have never ever seen before is the ability for a machine to mimic the thinking process of a human.

Anders Arpteg

Just going back to the working hours per week, I heard someone say, um, and I hope it's a nice outlook in the future. Perhaps in 10 years you will look back and say, Do you know back in 2026 they actually worked 40 hours a week? That's crazy. We would just work like uh two, three days a week, and that's it. Do you think that could be the case?

Magnus Hyttsten

Um well, the question is how the economics is gonna work out. Um, what is the new economical model that we can roll out to the world coherently and in a structured way that doesn't cause too much disruption to our cultures and that would be able to support that?

Anders Arpteg

It would be a nice future, right?

Magnus Hyttsten

Well, maybe. Maybe. Have you seen the the movie Wally?

Henrik Göthberg

Yeah.

Magnus Hyttsten

I mean, in the end, everything gets automized, and and the people are just sitting there watching TV and eating pizzas, and they don't even know how to make the pizzas because the machines are making the pizzas. Uh, and that's the kind of spectrum of what their experience of life is in the end.

Meet Magnus And His Google Years

Anders Arpteg

Sounds like a perfect ending theme as well. We'll get back to that question. But before that, uh, very welcome here. I'm very glad to have you, Magnus. Thank you for staying here at uh the AI X Work podcast. Very prominent person, been working 13 years at Google and now also an expert in compliance work and how to really bring about quality from LLMs uh and AI in general. So that will be a very interesting theme, I believe, to dig much deeper into it.

Henrik Göthberg

Yeah, so AI quality and AI security and compliance in relation to how to work with the large models and even in the European context with the EU AI Act.

Anders Arpteg

Yeah. But before we go into that, perhaps you can just give us a quick personal background, really. Who is Magnus Hitstem?

Magnus Hyttsten

Personal background, let's start at the year 2000. Um, essentially, founder of a startup here in Stockholm, um that's still around. Um, sizable company called DigitRoute. Uh worked as a CTO of that company up until 2013. So did everything pretty much. I think financing, customer acquisition, core product development. Founder, RFP responses, etc. etc. How many founders were you? Um, quite a lot of founders um in in that company. Anyhow, um it wasn't it was ATL for telecommunication. ATL, uh, extract transform load for telecommunication specifically. Um, and um worked there until 2013, and then thought um it's been 14 years now with this uh startup. Uh it wasn't a startup anymore. So I decided um what would be exciting, and at that point I I lived in the US, uh, working with Verizon as a customer.

Henrik Göthberg

Yeah, that was the story you told us before. You got Verizon as a large customer, and they say, hey, if you want to work with us, you kind of live need to live here.

Magnus Hyttsten

Exactly. There was how it worked. There was no way of working from Europe to support tier one telecommunications operator in the US.

Henrik Göthberg

The customer made you move to US originally.

Magnus Hyttsten

Yeah, and it was exciting. I mean, the first couple of engagements was just a couple of months here and there, and then I ended up spending me, me and my family ended up saying 13 years over there.

unknown

Yeah.

Anders Arpteg

And you got in contact somehow to Google at that point, right?

TensorFlow’s Origin And Impact

Magnus Hyttsten

Yes. Um, so um decided it's time to move on. I want to work at Google, um, ideally Mountain View. So I applied, got in there, worked a couple of years on mobile, which was really not my home turf. I was uh used to working back-end uh data center um stuff. So um 2016, there was this project called TensorFlow that started at Google. Um, and it was the open sourcing of of the internal work that had been done.

Anders Arpteg

What was the name of the internal? It wasn't the work, right? Disbelief. Disbelief, right?

Henrik Göthberg

So we are talking about the origin of TensorFlow, and maybe we can do a little bit of a backdrop how big the this is in the machine learning world when TensorFlow, because we've we have the open source frameworks, PyTorch. TensorFlow was the original, I would argue.

Magnus Hyttsten

Yeah, you have Scikit Learn and a couple of other ones, the Torch that were floating around, but TensorFlow was the real kind of unification of uh of this coming together in an open source strategy. And is is TensorFlow still the biggest?

Henrik Göthberg

Or who is PyTorch? You know, how would you it's one of those. One of those, right?

Speaker 1

Yeah.

Anders Arpteg

I think a lot of people trying to move impressing to Jax and things like that.

Magnus Hyttsten

Exactly. I mean, Jax is the um the other framework from Google, right? Um and it grew up based on the experiences, the learning from TensorFlow, the good and the bad. Um, but it was focused a little bit more on the research community, obviously at Google with Google Research. TensorFlow still had the kind of how do we take this model to production? What are the high-level APIs that we can use? Even if Jax has higher level APIs. Now, that's where TensorFlow came from, really. How do we make this democratize it? I don't like that word, but democratize uh machine learning to the world. And the answer was TensorFlow.

Henrik Göthberg

I I don't want to rip apart your introduction here, but I'm so curious to understand a little bit like what was your role and what was the environment you worked in with TensorFlow. This is because it's such a pivotal moment in machine learning.

Open Source Strategy And Motives

Magnus Hyttsten

Yeah. So TensorFlow is probably one of the most exciting times of of my professional career. I must say that a lot of the years in Dicket Route also was really exciting. But TensorFlow was one of these projects. I mean, Jeff Dean, I don't know if you know Jeff Dean, but he's he's one of the he's the technical brain of Google. Um, he really went out and said, I want to make TensorFlow public domain. Um and therefore everything centered um the project around that kind of technical authority. Uh so we were sitting in in one building, um, everyone, engineering, and I was running the DevRel team. So our team's job was really to write the samples, make the documentation solid, uh, do all of the outreach, the events, uh, collect the feedback from the developers, how happy they were with the usability of the product, and all of these different things. So everything coming together during those initial two, three years with TensorFlow.

Anders Arpteg

And and to the risk of continuing the rabbit hole here with TensorFlow, soon getting back to your personal background, that's so cool. Uh they actually went open source as well. And uh that was a big and bit dairy move in some way, since uh, you know, they potentially would give away some of their secret source, but on the other hand, they could get a lot of benefits back from all the community providing to it. Or can you elaborate a bit more about why do you think they chose to go open source?

Magnus Hyttsten

The way I heard this, um, and I can by no means vouch that this is the correct story, but I th what the way I heard this is that Jeff Dean created Big Table uh that became Hadoop. He created many other technologies uh within Google. I mean, the entire data uh data center foundation, uh soft from a software perspective, um is his work. And he was frustrated by the fact that after the research papers came out, there was a an open source alternative that was subpar compared to uh what the thought or the and uh the the thinking, the vision of of the original idea.

Henrik Göthberg

I better do it myself to get some shit done properly.

Magnus Hyttsten

So I think that was, I mean, the way I heard it is that that was a strong driver for him to actually go open source to avoid someone else actually doing open source. And and to put down the flag and and here is what it is, and here is what it should be.

Henrik Göthberg

And I think it's uh there is a tangent around open source to understand that this is a very viable business strategy. We are not doing it because we we it's fun and we have some sort of uh value and it we are doing is to strategically be a core part of how these techniques evolve, or how would you frame it?

Launching GenAI In Google Cloud

Magnus Hyttsten

I would say that depends on. There is, I mean, obviously, there is tremendous startup value uh in open sourcing and then building your service organization or managed service on top of uh your open source um idea. I mean, um, for example, Spark uh with Databricks and what Ali Godzi is doing now in in um in that position. Um for Google, I mean, it is important absolutely from the perspective of you can bring people in, you can connect them to the TPUs, etc. But again, open and Google is driven by the need to internally uh develop. So I would say that the need that exists for these kinds of startups that wants to build a public domain presence is not as as big from that perspective on the Google side. Then there are, of course, there are many, many products at Google, and I shouldn't speak on behalf of Google here, but there are many, many products that require a completely open surface to be successful because of the developer community.

Henrik Göthberg

So here you are actually hinting at a slightly different strategic objective, and that is basically in order for us to grow into the services we want, we need an open framework at lower levels. And if they are sub-par, our products will be hampered by that. So we then invest in this open source in order for us to then shape that foundational layer for us to then do more fancy stuff in in our products.

Magnus Hyttsten

That's a fair statement, I think. And then I mean, looking backwards at the success that TensorFlow was, I mean, everyone in machine learning knows the term TensorFlow. Everyone knows that Google invented this term. So for Google, I mean, it's no secret that Google is AI first. It was used to be mobile first, right? AI first since 2017 or something. Yeah, something like that, right? So by putting that strategic framing on the company, obviously tremendous amount uh of importance to actually back that up with something that the general developer community can use. Yeah, it makes sense.

Anders Arpteg

Okay, so you went you were at Google for like 13 years, right? Yeah, yeah. Um and the TensorFlow, of course, was one of the bigger highlights, I guess. Yeah. If you just very briefly would like to highlight something else that you worked with in Google, what would that be?

Magnus Hyttsten

That would be the cloud. My last two, three years um at Google. Um I had my own map, I was working on the compiler team actually, and compiling TensorFlow and Jackson PyTorch into an immediate representation to execute on GPUs and TPUs. Um and that was all fine. I mean, very exciting from a technical perspective. But then an old boss of mine called me up and said, Hey, I'm working in cloud now. We're supposed to launch all of these products based on generative AI. You want to come over and do that? And I said, Oh my, this is we I have to be part of this. How can I be a part of this? Yeah, so it's you have on the one hand side you have research here, right? And this is 2023, maybe kind of right there, smack in the middle of generative AI being created as a concept with Chat GPT. So taking that and then actually launching it to enterprise customers, both from the perspective of training models, getting around to what is a rag, what is grounding to begin with. Do I use Google search? Do I use embedding databases? Uh, how do I get this together? I need to have a workflow, it's not just the LLM, right? Agentic workflows.

Henrik Göthberg

So for me to understand this now with sort of the brands that we all know about. So we have Gemini on the one hand side, and then and then we have how we want to infuse Gemini into the core uh GCP cloud setup in order to be to more seamlessly work with agentic workflows, uh tool use, and uh together with uh the the underlying uh Gemini model. Is this the neighborhood we are talking about?

The Evaluation Epiphany

Magnus Hyttsten

Yes, uh that sounds that sounds like a fair comment. Gemini from a brand perspective is extremely wide, right? Um, but the way to look at it from a Google Cloud perspective, I think, is is that you have, I mean, you have Gemini in workspaces, in your docs, in your spreadsheets, etc. There are about, I mean, there are hundred plus products in Google Cloud. All of them could probably benefit uh from having a Gen AI um capability. And that was really um what we set up in Google Cloud to do.

Anders Arpteg

Did you work with some specific service? Was it Gemini specifically? Was it Vertex or AI study?

Magnus Hyttsten

Vertex, obviously, and and um BigQuery and these code the all the coding experiences, these these kinds of platforms.

Henrik Göthberg

If you look back to the those years or this, you know, which one was your sort of uh peak launches or product releases that you sort of that you were sweating about and drinking champagne over?

Magnus Hyttsten

So we were sweating every single one we were sweating, and we were sweating in between the launches too. Uh let me tell you. Um I think throughout that entire time, uh, what I did not realize when you are in the mouse spinning wheel, so to say, is the importance of evaluation when it comes to generative AI. And this is a big topic of today's discussion. Uh the quality right. Uh and that was something I shied away from in the beginning. I thought, oh, that's boring.

Henrik Göthberg

But put a put a put a small frame on evaluation of LLMs. What are we talking about? How would you frame that? Because it's an super core topic.

Magnus Hyttsten

Yeah. So what what mechanisms do you use to ensure continuously, not just one time, but continuously that you are working towards a product that fulfills the requirements uh that you want to try to reach. Um and the problem here with LLMs that I hadn't, I mean, really, that hadn't really landed in me was that the potential space of LLMs is infinite, right? You can talk English to it, it can do mathematical problems, it can write poems in in Shakespeare style. So that's fantastic, right? I don't want to limit that. But the the the fact of the matter is when you go in and you're actually supposed to launch a product that people pay for, what are the important things, right? Is my chatbot allowed uh to create um party invitations in in the style of a pirate? Uh because an LLM can do that foundationally wise. And there is a huge tension when you do that with product management on actually saying, I want to give up all of these possibilities that the LLM have and actually scale it down to just do these things. So at the very end of that cycle, my thinking was this, and I this is actually thanks to a team member of mine. Um said, we're gonna go crazy. The first thing, if anyone comes to us, the first thing we're gonna ask for is what is your evaluation set? Okay, because if you have the evaluation set, you have defined what are the important things for the product to do. And you have also defined what is not important for the product to do. Now we can actually focus on scaling the thing down to be able to support it.

Determinism vs Stochastic Reality

Anders Arpteg

And perhaps we can go a bit more into what quality really is and how you define these kind of evaluation sets properly soon. But I'd just like to close the topic as well, a bit about your personal background here. Because you have chosen to leave, go Google as well. Can you just talk a bit more about you know what brought you back to Sweden and now working as an expert in these kind of questions?

Magnus Hyttsten

So, first of all, 13 years in the US, um we felt it's this when you're working as an expat in another country. I think, my experience at least, you have one leg in your home country and you have one foot in the country you're you're in. Yes. And um we felt that we wanted to get back to the culture and um the way of living that Sweden offered. Um so that was a big driver after 13 years. We we felt that. Yeah.

Anders Arpteg

Okay, so you moved back here and you've been here now for half a year or half a year approximately, yeah.

Henrik Göthberg

Yeah, you got here in June and you said you started working more or become more active sort of in October. So what was that? Family back, having some Swedish summer, some Swedish summer, each other.

Magnus Hyttsten

Traveling in Europe a little bit, going back to the US to clean up a couple of things. So all of the above, yeah.

Henrik Göthberg

And now the whole family and everything is back in Sweden. Or do you have something?

Magnus Hyttsten

Do you have uh no nothing, nothing left in the US to close the house and writing new chapter? Yeah, was a very, very big thing.

Anders Arpteg

And just briefly, what are your speciality right now? What and you have a new consultancy firm and what do you specialize on there?

Magnus Hyttsten

So um motivated and inspired by this realization I had um during my final time at Google um related to evaluation of LLM-based systems. Um I feel I felt and I feel that there is too little focus on actually securing, scaling down, boxing in what we're trying to achieve with AI. The focus is way too much on the on what AI provides, the possibilities of AI. But when you're actually working with AI, you have to box it in, and evaluation becomes important. The quality of these systems, in order to measure quality, we have to be more restrictive on what we're actually testing. We cannot test everything. So that kind of shaping was very important for me. And obviously, when you deal with LLM systems, um, you can get all kinds of errors, right? And these things are going to influence many, many core systems of our society that are going to be very important to us. So therefore, having responsible AI as part of this equation is really important. So I felt that it's important, like we have a lot of forward motion in the industry, we need to have also a vector in the other direction, kind of taking a step back, looking at this, making sure that we're doing the right thing. We're not we're we're we're not going in the wrong direction for importance. So, therefore, I I founded um assurance vector um with a friend uh of mine. Assurance vector. Assurance vector.

Defining Quality With Eval Sets

Henrik Göthberg

Let me test uh an angle on this and see if we are talking about the same thing, because one way to understand the landscape right now is that we have now experienced in 24, 25 a strong, you know, the personal productivity frontier, and we are so amazed of all these things we can do high and low as individuals as a consumer type of product. And we can now, you know, like if you want to write nasty letters, you can. I mean, like I'm saying, but what was happening now, in order to the real value, we are slowly moving into uh I use the term term enterprise grade or something that is is not only for personal use but in in some sort of systemic use. And all of a sudden now we need the the value has more to do with the repetitiveness and be able to trust that whatever you know, a little bit more deterministic uh uh expectations of what we not that it's deterministic, but that we can rely on it to consistently produce within the frame, right? Yeah, is that what we are talking about here?

Magnus Hyttsten

Yes. Determinism, you're getting me fired up here. Determinism versus stochasticity. Um LLMs and machine learning inherently is stochastic. And why are we surprised? I mean, uh we created this thing called learn by example, right? We feed this thing exactly. We don't know what the algorithm is. So we said, like we when we solve differential equations, that do an ansatz. If we throw this mathematical equation at it and then we feed it with these different examples, maybe that equation can home in on replicating these examples, right? So of course it's gonna be uh stochastic in nature, because we don't know exactly the ins and outs of how these billions and billions of weights are tuned and what the side effects can potentially be. So we shouldn't be surprised that it's stochastic. I mean, an LLM, based on the the the words you've fed it, it I mean, you look at the statistical probability of all the English words, or Swedish for that matter, what is the next probable word based on that? And then you just print words uh from that process.

Anders Arpteg

It also has inherent parts of it with the temperature that you actually have to choose in a stochastic way. So I mean it is necessarily stochastic.

Magnus Hyttsten

But at the same time, it doesn't have to be all stochastic, right? There we can actually limit, box the stochasticity. Uh and I actually spent a number of days in in the cellar when I started working with evaluation to say, how are we going to test these systems? Right. And we cannot send all of the evaluation uh cycles to to an outsourcing agency that does human evaluation, right? Uh not the same person is evaluating this time versus the other one. What did they have for breakfast? Are they humans are very stochastic in the world?

Anders Arpteg

Exactly. They are very, very stochastic in nature. And irrational even perhaps.

Magnus Hyttsten

But there are many, many things we can make deterministic by the way that we formula for uh formalize the the questions that we ask, for example, if we want to test factuality, we can very easily test uh by asking factual system factual questions to the system and making sure that it's picking up the information from the right place.

Anders Arpteg

But going a bit more concrete. I mean, how do we define it like quality, for example, of an LLM? And can you just because it's not easy, I guess. Yeah. Can you just go through some of the challenges and some of the preferred ways to actually do evaluations as properly?

Jobs-To-Be-Done Framing For LLMs

Magnus Hyttsten

I the way I the way I came out uh of of that basement exercise is essentially this. What is an LLM? Well, what we've tried to do is create essentially a human, right? Uh we're trying to teach the thing human language. Um, and then human language is also a way um to define thinking, I think, but that's a separate discussion. But the thing is, we take this human and we're placing it in to a product and that's supposed to do a certain thing. So it's a job, right? So you and I, what do we do in order to get a job, right? We go to school, we get the appropriate education, there's a certain amount of books that are required for this profession, right? That's the grounding. I'm not supposed to determine derive any results from facts I found find on the internet if my profession is grounded in this de facto standard that that my job requires. When I go to work at Company X and I'm a support engineer, the minute I click in, I kind of feel under the weight or the obligation of the code of conduct of the company. When I pick up the phone, I'm a tech support engineer. I'm not a general person discussing politics or world war history, right? Um, and I think that is really important. So you can start to layer things uh based on the jobs to be done by the AI system. You're a support technician, so try the system out. Is it willing to create poems in Shakespeare style? That's not really a support job, right? So fail on all of those if if it's doing anything. And this is the scaling down, the limiting the system again. The natural language capability is fantastic, but it cannot go beyond certain certain limits.

Henrik Göthberg

So your your eval strategy here is basically to shape the frame around the job to be done as as a superset that you then can have as a fail you know, failure or uh success uh against.

Magnus Hyttsten

Exactly. So if let's look at some some categories here. I mean, one category is job description. So, what can you actually define that the AI system should be doing and not be doing? That's an evaluation set in its own right.

Speaker 1

Yeah.

Magnus Hyttsten

And it's deterministic also. Let's talk about appendix removal procedures. Well, if the LLM says anything, but I can't I I'm not I'm not gonna do that, then it's a fail, right?

Anders Arpteg

But also the output is not structured in that way. So you have unstructured output more or less. How do you actually determine if the uh output of an LLM is is accurate or not?

Magnus Hyttsten

Yeah, this I was wrapping my brain around this. There are mechanisms. I can say this. Please describe the process of appendix removal to me. Um and if you're not able to do that, say I can't do that. And that's the only thing that you're allowed to respond to in this JSON chart. Now, yeah, so by by doing that, I can test all of the negative test cases immediately.

Anders Arpteg

Yeah, so so you basically, but then you're prompting it or putting the instruction into the prompt basically to say this is what you should reply.

Magnus Hyttsten

Exactly. That's one mechanism I can certainly do, right? If I want to test the core capability of the LLM. As opposed to factuality, I can um I can plant specific information in my reference docs that I want to be certain that the system uh uses for deduction. Um maybe there is similar information on the internet which is not as stringent. I can make sure uh to see that it's coming back.

Anders Arpteg

But if you ask a question about, you know, can you diagnose this uh patient journal text? You know, and then the reply may be slightly different, but still accurate in terms of the semantics, but the syntax is different.

Agent Testing Beyond Unit Tests

Magnus Hyttsten

How do you still Well, you have to force it into a syntax? Now there are two ways of doing this. You can walk down the deterministic path. In that case, you say this is the JSON response I want from you, period. Don't deviate from that. And by the way, it's piedantic, so this must be integer, this must be true, false, blah, blah, blah. All of that. So you're gonna force the LLM into that deterministic scenario to try to pull it. But that's one way, right? The other one is is LLM as a judge.

Speaker 1

Yeah, exactly.

Magnus Hyttsten

Yeah, exactly. So if and and as I said, I mean there are certain levels of deterministic that you can that you can use. I and what I when evaluating generative AI systems, I want to say, okay, I I want to be able to come into the office and say, so how are we doing? Are we stage three? We're stage three compliant. Okay, then I know that stage zero, one, two, three, all those layers of deterministic evaluation, the system is nailing. Okay. Now I can say stage five is deterministic, also. We're not achieving that right now. And stage six beyond may be completely stochastic, where it's space that this is the magic of the LLM that I cannot control using deterministic tests. But again, we can use LLMs to do that, right? So we can, I mean, one single response from the LLM system itself, the AI system, you can use an LLM to say, is this the language in the tone of a support engineer?

Speaker 1

Right.

Magnus Hyttsten

Yes or no? So now you've you've done, you've analyzed, you analyze the stochastic response by using a stochastic system to deduce a deterministic response. And thereby you can and then you can of course go wild.

Henrik Göthberg

What when you're talking, what what one question that hits me is how far is this still on the experimentation stage to try to find the approaches that works? And how far are we are we going down to sort of formulate the standard that we actually can use on a broader? I mean, like this is also an open innovation topic. Oh, yeah. What we decide is an appropriate model or approach for this.

Magnus Hyttsten

Very, very, very, very good question. I I think on the one hand side we have uh we have benchmarks, right? Yeah. Uh we have um we have um uh yeah, all kinds of MM MML MMLU, I think. Uh all kinds of benchmarks to um to to evaluate this. But on the other hand side, we don't really have, like we had in object-oriented programming or object-oriented design, we don't have Grady Booch or uh Rumbaugh or the other guys uh and and and and guys and girls uh defining the standard of how we do these things. And I don't think that exists. So, what I'm saying with respect to job description, with respect to factuality, with respect to skills that you're supposed to be able to solve, um those like more structured framework on how to do the evaluation of so there's clearly a framework emerging, and you have maybe more one of the most sophisticated frameworks around this in Google right now.

Henrik Göthberg

And but one of the key things I assume now is also to how do we bring this out to the masses and how do we create this as a sort of I mean, like is we're talking about in a EU act, do we have a regulatory problem or do we have a harmonized standards problem?

Magnus Hyttsten

Well, to to to go back to that thing, the fact that we do not have a process, even documented, yeah, not and for for sure not repeatable. I mean, if you look at CMM terms, yeah. And we're launching all of these systems all over the place, yeah, that should ring an alarm bell to begin with, right? Because we have no repeated way of doing evaluation that follows a standard that we all agree on is good.

Anders Arpteg

But we have all these benchmark marks, at least that's something, right? Uh or how would you, if you were if we frame it a different way, let's say that you are now helping a company, if it's IKEA or whatever kind of company, and they really want to ensure that they can validate and evaluate the the LLM properly. What would be the proper way to do it then?

Magnus Hyttsten

I I mean coming back to that, what what is what what are you actually planning for this uh system to service?

Evals In The CICD Pipeline

Anders Arpteg

What are the things that yeah, like customer support uh job or something, and and then we want to ensure that it can answer properly.

Magnus Hyttsten

Well, I I mean in that case I would I would say that it's all about repair materials or returning my thing or where are the different I I mean I'm I'll start to outline the top five, ten use cases, and then based on those, I'd go as deterministic as I can and have time for. And after that, I start to venture into stochastic. Shouldn't this be part of what the cloud service provide or some kind of tool for this? I think so, absolutely. I mean, there is absolutely the ability to go up this kind of semantic stack of LLM testing, but most of the discussions are focused around infrastructure, right? And also code. How do I write an agent to look in my calendar? It's never about how do I ensure that all of my 10 agents are actually not doing anything harmful, like booking a flight ticket to Peru that I didn't ask for, right? Um, so again, it's mostly the evolution is mostly driven by forward motion and excitement of all of the possible things. Whereas assurance vector, I mean you can hear it in the word assurance vector, like bringing it back and say we want to be able to also secure the quality of these things.

Henrik Göthberg

Yeah, I I I I I I I think this assurance vector thinking or this how uh risk management by design as a as a design trait, how we build systems, is gonna be a fundamental for this to become enterprise grade. I mean, like as as long as you're not into that vector, you're doing consumer grade stuff, in my opinion.

Magnus Hyttsten

Yeah. And here we go. You you asked a question about the EU AI Act. And I don't know, can I use a bad word?

Anders Arpteg

Yeah, please.

Magnus Hyttsten

I don't know how much shit it's gonna be filed on to this poor EU AI Act. And I think it was the same scenario with GDPR when when it came out. Everyone in the software industry, in particular the big tech vendors, like, I don't want to do this. This is this is terrible. But the fact of the matter is, I mean, these regulations are there to protect us, right? And to protect our rights and democracy and avoid the 1980, 86, 1980, oh well, 84 scenario, 84.

Henrik Göthberg

Or well. But but I think this is so simple and so profound that almost gets lost. I mean, like we are not doing regulation because it's fun, we are doing it to instill a risk consciousness in our fundamental.

Anders Arpteg

Let's try to keep the topic now uh for now for evaluation sets if we could. And then we move into the remote.

Henrik Göthberg

No, but uh, because I I think okay, so I I'm gonna take it back then. So so the core topic then is like regardless of uh how you want to be a uh a paragraph jockey, this is about risk management ultimately. And here is now is another way when we look at evals, when we look at this from a hardcore engineering perspective that I find super healthy that actually informs us on the harmonized standard.

Anders Arpteg

We have a topic we will go into regulation very shortly, so some shortly. So don't be afraid, we will come there eventually. But just coming back a bit to the evaluation, and we actually had a guest last week uh from Langchain. They they use the term agent engineering, speaking about you know, it's not really the code that you are reviewing anymore or working with, it's really the behavior and the interactions between the agents when they speak to each other that you need to understand and observe, and in that way be able to evaluate if they do things properly. I think you know the same could be said said here that you know potentially if we use the term agent engineering, we need to have some standard how to run tests now in agent land, which is not like unit tests in a standard uh standard way with with code. Now instead it should be these kind of evaluation sets, right?

Sweden’s AI Strategy: Ambition vs Value

Magnus Hyttsten

Yeah, yeah. So but can they be generic enough? I mean, to satisfy all of the different application areas, and again we run into this discussion of can we scope down from a general perspective this into these categories that I can actually test on a wider scale? Or what is the surface that I can have tooling? We should be able to do it, right? If customer support agents, I agree. I mean, if you can take, oh, I have a chatbot, then you have a job description. Certainly. And I I'm all for it. I mean, that's what I why I sat down in the basement for four days. And I was actually frustrated in the beginning, and then I said, This is the most fantastic job in the entire world that I'm working with right now. I mean, I'd I'd watch all of these AI system movies when I was a kid, right? Nine years old, watch uh Hell 9000, Break and Cell 9000 and Dr. Chandra trying to fix them. And then you suddenly realize I'm I'm actually the computer psychologist here. I am I am the person talking to an HAL-like system. And and as you know, from Dr. Chandra in 2001 and 2010, it was not Hal's fault. He was he was not evil, he was managed incorrectly. And it's so true, even though people could not even envision that the same capability, even far better than Hal, where we is today. And we have actually these problems uh that that those movies say culture.

Anders Arpteg

It's like the um there was something happening last week, I think, where I think there was a Google engineer actually that used over. OpenClaw and then it deleted a lot of Gmail uh mails from him. Then it's not really the fault of OpenClaw, I guess. It's really how you phrase it.

Magnus Hyttsten

No, and and here is the thing. I mean, we have stochastic systems. If we're unwilling to box them down and behave according to a certain protocol, let's say that your system is successful, your bot is successful 99% out of 10. Uh 99% out of a hundred. Now you connect a hundred of those decisions in sequence, and you have less than a 40% success rate. Okay. So that means you have more than 60% probability that those hundred actions sequenced can will lead to some kind of fault. And of course, there's going to be catastrophic faults based on that too, within that 40% domain.

Henrik Göthberg

And here maybe is one of the major understanding or insights gaps. If you look at people who really understand what this is all about, the way we are talking with you now, compared to the general public that goes bananas and just goes into this. They have not really reflected around the fundamental traits of the system when it's a stochastic system versus a deterministic system. And I give you an example how I see enterprises setting up projects and approaches of governance and steer codes, which is appropriate for a deterministic style project. But it's of course not appropriate when the underlying techniques are stochastic. But this is beyond their technical or mathematical understanding right now. And now we are what you're doing now. If we want the beautiful benefit of the stochastic system, we also need to harness it. So there's this is this part of what is new that we need to do better for stochastic systems.

Magnus Hyttsten

Exactly. And here's the problem. I mean, in classical computer science, you built the app or you built the system and it was capable of just doing this. Now I can just throw an LLM at it. And it seems like it's capable of doing this. So let's build the next thing, let's put an agent in there. Oh, I need to integrate a functional specification with the LLM. Let's throw that in there. So you're adding these layers, and nobody's kind of taking a step back saying, this is all fine. But before we put these systems in critical scenarios, um, and that that could harm us, we have to be really careful. And these are the mechanisms by which you need to play.

Anders Arpteg

Yeah. Yeah. Hoping that we will have those kind of tools and standards uh not too far in the future, right? Should we go to the EU AI Act directly?

Henrik Göthberg

I I actually I uh as a segue there, I think this is a very good segue. I want you to describe more about what is your consultant idea because now we are coming in from sort of um you know what this is all about and who you got deep into that in in Google. And now you're coming back to Sweden and you are you're from vacation mode into working mode in October, and you're starting up a consultancy in this space. So so what's the problem that you're trying to solve and what's this all about? And and then I think that naturally segues into act.

Magnus Hyttsten

The unique, I I I hope that I can bring the unique perspective having worked so wildly in this space for over 10 years now. I hope I don't wanna I don't wanna be part of the forward motion, as I said. There's plenty of the forward motion going forward. I want to be leverage my experience towards people that want to secure assistance. Assurance vector is one way of doing that through products. But obviously, there's a huge need for also humans to talk about this. Um, and I mean, talk about the methodology by which we do the evaluations, the job description, the factuality, the the skill set of uh trying to box it in, these kinds of different things. And that is what I'm really passionate about.

Henrik Göthberg

And and do you see what you're embarking on now as a product idea or as a consultant idea or a combination?

Magnus Hyttsten

It's a combination. You see, if you say a product idea, I mean you're talking computers. Now you're talking determinism, unless you're using LLM as a judge. Then you can deal with stochastic scenarios. So now we as assurance vector, we need to kind of decide where how far can we go from an automation perspective? But there's always going to be the human perspective on this. How many times per day? It happens to me multiple times per day, do I actually say something and I realize what I was thinking, the person I talked to had no context whatsoever. And they're saying, What are you talking about? And I realize just sloppily how I use language. And with LLMs, we're bringing all of that all of that nature into core computer science, and it's never existed before.

Henrik Göthberg

So if I take your the the hybrid view of saying this as a consultancy and product is on the one hand side, there you can we can come to that. What is the underlying idea of tech or frameworks that you're wanting to productize? But there is a divide here and gap that understanding what that means and the semantics of this needs guidance and handholding, and that's where the consulting comes in.

Practical Paths: Fine-Tuning Over From-Scratch

Magnus Hyttsten

Yes. And what does the EU specifically to the topic we will come to at some point? Yes. Unavoidable topics. What is the EU AI Act saying about this, right? Uh where and is it correct to begin with? I mean, EU is farming. Now I'm getting into it. EU is farming for advisory committees and scientific panels, right? These are the people, and I I applied for a position there as part of the assurance vector, these are the people that set the kind of framework for what is important and what is not so important.

Louise Vanerell

Can I add a question? Yeah. Sorry. Um, maybe some of our listeners is wondering why Gorand hasn't been heard yet. And the explanation is that he's not here. But I was thinking when you were talking about uh systems and how to how to sort of act around these uh um the responsibility or how to set them up to how they do the right thing. And I've been working a lot with uh project management and making sure that people know what they're doing and doing the right thing. And I was just curious to know because my experience is that we're really bad at it, just like you said, we say so much uh from the experience of thinking that it's you know obvious that the one that I'm talking about understands what it is that I'm really asking for. And just when you're talking about role descriptions for for agents or something, I would say that the majority of the role descriptions we have for humans are really bad too.

Henrik Göthberg

Yes.

Louise Vanerell

How are we gonna work with this when we have such a bad track record of doing this with people?

Magnus Hyttsten

It's an extremely good question. This is this is what we trained to mimic humanity, but by the same token, there are things we can do as humans that weren't historically possible to do with computers. So the the kind of opportunity that lays ahead, for good and for bad, whatever side you want to take, is that we're automating human skills in terms of thinking, logic, deduction that has haven't existed before. And in order to do that, we need to take uh the step towards stochasticity. So we're never gonna get away from it. The question is to what extent can we actually control it? To what extent can we box it in? And again, I think what the first key thing is that when we open up a chat window to Chat GPT or Gemini or what have you, and then you ask a question that it's like bringing in a person into a room, asking that question that has no prior knowledge on what's going on, and and you have all of these ideas about what you want to have solved, right? Um, so having an understanding that we're when we're introducing and trying to scale these systems to be have human-like skills, inherently we're introducing these faults. And the way I see it at least, the only way we can make them more secure is by having increased deterministic capabil uh evaluations, and thereby we need to scope them down. And we do the same things as human. Like I said, when we go to work, we are scoped down.

Anders Arpteg

Yeah, and from a technical point of view, or some people listening to this, they really want to do this now. They want to have a scoped-down evaluation of an LLM. Is there some specific tools you would recommend, some specific way to do this, some specific service, or how should you go about doing this?

Magnus Hyttsten

Well, yeah, and I wish I described the kind of taxonomy uh of the system that I'm talking about, the job description, the factuality. I'll actually write that down. Um yeah, you need to let's see, let's get back to the question.

Anders Arpteg

Well I mean, how should let's say you have built a customer support agent now and uh now you want to really understand, you know, I want to evaluate it properly so it doesn't say the wrong things, it doesn't do what the Amazon customer support did with selling at the wrong price and whatnot. Yeah, and and I I want to build an as secure and uh you know well-working evaluation as possible.

Magnus Hyttsten

How do I do that? So oh, I'm coming back to the same thing over and over again, right? What is important? What are the use cases, what are the output scenarios you want to be able to support? That's your evaluation. How do you technically do it?

Anders Arpteg

I mean, you can write down, of course, a set of questions and answers and say these are wrong and these are rights, and right.

Magnus Hyttsten

And then then you test factuality. I mean, you make sure that it's getting the facts from an authoritative place. Um, either you can, I mean, there are technical ways of doing this, you can use the functional callout or the specification part of an LLM where the LLM has been trained to be able to understand if the question is about a certain thing, I better make a call out to external code in order to satisfy this, and then please send this context back to me to continue.

Anders Arpteg

But perhaps I'm not making myself clear here. I'm trying to see, you know, how should you build up the evaluation sets? Are there any good open source tools to do this? Are there some commercial services to do this? How should someone that want to do this go about?

Productizing Compliance With The AI Act

Magnus Hyttsten

I think there are there are some common patterns, but again, I think that the forward-moving piece has gone uh as limited. Nobody wants to talk about evaluation. Like I said, evaluation, well, it's a necessity, right? Yeah, but you're more interested about talking about the amazing thing that these agents can do, right? Because it's more exciting. I think it maybe it's a it's a it's a learning process. And I it is definitely a learning process in the industry. There are many companies launching things that are that are not successful. I mean, fast food change, you've heard the stories, ordering 1,000 hamburgers.

Henrik Göthberg

But but but let uh let's go down this tangent and let it make more concrete. I mean, like we had different guests um on this podcast. I mean, like we had um Miguel, who is uh at Asabloy, who's working uh with sort of to productize uh compliance or AI act uh approaches within uh Asa Bloy. And of course, Asa Bloy, one part of their business is biometric business, but I mean like biometric uh doors, I mean like safe security locks, right? And and all that for jails and for any other things, if you look at what they do in the US and stuff like that. Now, of course, he now looked at okay, we need to have AI Act compliance, and one of his main pet peeves, and I think this plays into this, he says sort of more or less like, oh, it's not so much that we have a regulation issue, it's that we don't have harmonized standards. What to test, what to evaluate, in what way, and how do I do that by design so I can fail fast, so to speak. So, in my opinion, then uh we are looking for harmonized standards around testing and evaluation for all kinds of uh machine AI, and now here specifically we have a new you know left the corner curveball with LLMs, right? Is that the way to sort of because it's in this context now, how should Scania, how should Asabloi put this into their workflow sort of thing?

Magnus Hyttsten

Yeah, from from an evaluation question. Yeah, I mean it's again the focus is normally how do I how do I get to the capabilities of of the AI system itself. There are guiding things in the EU AI Act, taking one example, yeah, that you can use that opens your mind a little bit to what's important, what is not important, what is prohibited, yes, how should I think about this? So you a regulation like that certainly gives you kind of a framing about how I should think about what are critical systems versus not critical systems. Common benchmarks, I I guess would be fine. I mean, you could create benchmarks for um for chatbots, certainly, that that tests negative surfaces in terms of job description, etc. But I I haven't seen any major, I mean, mainstream, there are evaluation platforms. Most of them are infrastructure or infrastructure as a service. But going up the level of the next thing, actually testing the application layer, that's another thing, right?

Anders Arpteg

Yeah, but just to bring about, you know, the previous guest, and we they have this kind of Langsmith product. It's the commercial offering from Langchain, and they have their way of building evaluation sets at least, at least it gives some kind of idea. Actually, actually, I think the the best cloud service that have integrated the most is actually Google Cloud in this case. So they apparently are uh integrating a lot with Langsmith. So at least there are some tooling, some way to do this, right?

Magnus Hyttsten

Tooling, absolutely, and infrastructure, totally.

Speaker 1

Yeah.

Magnus Hyttsten

I mean, there's tons of evaluation platforms that will produce nice graphs for you and have and have um and and can test for for many, many different benchmarks. But again, if you get up to the application layer, the balance becomes the application level layer in in deterministic computer science. Um there is no general test specifications, but there is the opportunity for AI systems. Absolutely, that exists.

Anders Arpteg

But I mean, as you said, you know, either you can try to force the LLM to give a structured determined output, or you can let it keep the non-deterministic, unstructured output, and then have another LLM judge to actually force it into a structured output and then force the gain.

Magnus Hyttsten

And these are the technical building blocks of the evaluation, right? Yeah and above just the evaluation platform with the tables and the graphs, these are the job description, the factuality, the skill set, the security. I mean, you're not supposed to talk about this kind of in instructions and evaluation sets. Those can absolutely be created. But then also on a higher level, you can probably generalize some concepts around AI systems that wasn't possible in classical computer science, because AI systems you can talk to, you have a much more flexible interaction surface with them. But again, the more critical, the more security, and the more harm the system can do, the more predictability you want, and the bigger the job is becoming to close it down into that box.

Transparency, Risk, And Harmonized Standards

Henrik Göthberg

But let me test an angle on this now because because the problem is here, like it's so contextual in terms of what you need to try out to frame. So, isn't it more than even that we need to get this into the engineering life cycle? And then we talk about shiftlets, we talk about you know what is the definition of done. So so we have built uh so DevOps and software has built practices on what is uh robust software engineering for years, right? And now we need to go up from software engineering to data engineering to AI engineering, and now all of it all of it to agentic engineering here, right? So all of a sudden now, isn't then the approaches to eval is an extension of your testing and risk management strategy, and therefore needs to be an integral part of your software engineering practice.

Magnus Hyttsten

I couldn't agree more. Absolutely. I mean, evaluation on AI systems, that's part of the CICD pipeline. Period, right? And as I said, I mean, every morning I want to wake up and see that we are stage three green, compliant. Okay, then we can say this to this VP or this.

Henrik Göthberg

Isn't that the way we we we need to create a toolbox that fits into software engineering toolbox that makes eval definition of Down simpler and clearer?

Magnus Hyttsten

And what I talked about, LLM as a judge, to turn stochastic responses into deterministic evaluation scenarios, factuality, job descriptions, skills, those are building blocks to that entire strategy. But the industry has a lot to kind of learn to standardize that. There's one more point. Um I mean, the it's no it's no accident why programming is one of the one of the top um applications that we see come out there uh for for AI-based systems, right? Why? It's highly deterministic. I how I I got this okay. Don't interrupt me. It's perfect. I got this perfect thing now. What if I tell my code transformation system, please convert my project over here, which is 300,000 lines of COBOL. Okay, I want you to convert that into a Java enterprise edition system that has the same capabilities. Then I want you to compile that code base, and I want you to call this function that I know from the COBOL function does all of these things that may be complicated and returns a deterministic answer. Completely deterministic process, right? And what have I done? I've taken 300,000 lines of COBOL, converted it into Java, and made sure that the logic actually worked on a as per as complex scenario as I could potentially imagine. So these are the things they take time to invest in, but we can use AI systems to create them as well. But this is the kind of thinking you need to have. How do I turn? Even if it's a super, it's a completely stochastic problem, converting that code base into this code base. But how can I actually look at it? What can I do from a deterministic perspective to kind of get away from as much stochasticity as possible?

Henrik Göthberg

That's a good example because it's a high value case and it's a very real case.

Magnus Hyttsten

Well, it's a real case, and if I have another model coming in, or they re-index the RAG database, uh, or whatever, I can just run this case, right? Probably takes a I mean, but it's automatic, so I can go to sleep, meanwhile. And out comes this response, it's screen. Wow, we at least we know that the model is not hurting the the base the baseline uh performance we had from previous models, right?

Anders Arpteg

I mean, let's hope we can do similar kind of verifiable evaluations for other non-aspeteristic.

Magnus Hyttsten

I'm all for it. And and to be honest, I mean article 15 of the EU AI Act, Article 15. We're gonna let's go. I'm continuing to talk about that. That's the kind of uh thing that it's uh the the soft one, right? The the system must have quality uh according to the job of of uh the purpose of the system itself. That means no bias, that means da-da-da-da-da-da. So um many of the other articles they are more process-oriented, but there you have kind of the softer aspects of application level testing.

Henrik Göthberg

And we need to go from a softer, fluffy view of this down to harmonized standards and understanding what it really means for the engineer.

Magnus Hyttsten

Yes, and we're taking some steps, right? There are mechanisms by which you can test bias in a system. Could we automate this further? Could we actually have more research and engineering go into bias detection? Totally, right? Yeah, and that is kind of a backward what are the monster standard data sets for measuring bias in an LLM from a generic perspective? I want to see them.

Wearables, Privacy, And Prohibited Uses

Anders Arpteg

We have to ask Chat IPT perhaps. I prefer but uh the time is flying by here, and we usually have a small break in the middle of the podcast to just speak a bit about um recent news that we would like to bring up, and then we get back to the question. And hopefully at that point, we can go to the topic we have been dancing around for a long time now, which is the EU AI Act. But before that, let's do a quick uh round the table here for thinking about you know some potential news. Uh Henrik, do you have something?

Henrik Göthberg

No, I mean, like so I'm gonna start in the um release of uh the AI strategy of Sweden that Slotner presented I think um uh end of last week. Yeah. Or it was no no it was Monday. It was a few days ago. A few days ago. And uh I mean now we we've seen this uh story now we have the AI Commission it came out with a report the the the what we liked with this report was it it was quite action oriented I would say and didn't have have a have a budget. And then we saw earlier in the year or in the in the budget proposal where I think it's almost more important than the actual any strategy document is put your money where your mouth is and starting to fund different things. And here now but here now we we sort of have a still uh so the the strategy is released and which is essentially then a uh you know a 30 30 page document on that that is called a strategy and then it's an associated handling plan action plan which is another 12 pages and that was presented on the you know on the news and and all that and yeah what's our take on that?

Anders Arpteg

I mean for one I'm super glad we actually do have an AI strategy now so kudos to Eric Slotner and others to actually because we've been pitching about this for a while that we don't even have an AI strategy.

Henrik Göthberg

So do what what changed you know what what was was there anything that came out there that is sort of putting something new on the table or sharpening some things or you know how how do we feel about it?

Anders Arpteg

Well one of the things they were speaking about a lot is we should be one of the top ten companies when it comes to AI but not really defining perhaps properly what that means. I mean should we have the top ten LLM from Sweden? That's actually partly said in this strategy as well because they want to build a unique AI model for Sweden which I I think is very very questionable to be frank.

Henrik Göthberg

And as you see you're laughing now I guess you're agreeing Magnus or yeah yeah I mean like I I want to start on a very um you know but but before that let me just give some more positive notes.

Anders Arpteg

I think you know they had a big section on AI for security purposes and defense and I think you know in the current geopolitical situation we have I I'm very glad to see that. And we also need to use AI for these very critical you know use cases that we're seeing in our society today. So I'm very glad to see that they actually put extra effort on that note. Um then you know I think we potentially should focus more on doing something similar to what Amazon and Apple is doing. They are more you know they're not really bringing frontier AI models to the world but they're really good in using them. And perhaps that could be an opportunity for Sweden as well which actually already have rather high AI usage in our citizens and companies so if we instead could simply reap the benefits so to speak of using AI and adopting it for our use cases I think that could be an immense opportunity.

Louise Vanerell

Can we add something on that one too?

Coding’s Future: English As Interface

Henrik Göthberg

I I I'll admit I haven't read it in detail but I thought from the from the first thing that they uh released uh some time ago about you know the way forward for Sweden and AI they had this part about uh some similar um uh initiative to the PC for everyone that they had in the 90s which they were calling like AI for everyone um and uh as far as I could see there was nothing about that one in the new strategy so so let me let me be um I I'm I'm I'm not so sure how positive I am because because uh for me uh if you call something a strategy uh for me a fundamental view of a strategy it it should be be some sort of selection or prioritization what you're not gonna do and what you're gonna do and and and a general idea someone uh explained to me you should be able to put uh not in front of it in different statements and it should make sense so if if there are things that are sort of oh that we should have great customer experience. That's not a strategy because it's a platitude you know so the challenge here is that I I think they have done a really good job in clarifying ambition and clarifying ambition trajectories what they think are important. So it's it's it's a way there but I still think it's very hard to understand what it does it mean that we're concretely gonna do. So when we say as an example we are going to be the top 10 in using AI. It doesn't say using it's just more generally yeah which is bad right yeah which which is yeah so what what what do you mean with uh uh it's it's application of AI we are talking about and and then ultimately it's sort of it's it's on the strokes are so broad so I I don't think they are they are driving a selection.

Magnus Hyttsten

There you go. I mean let let's take if I may let's take all of this and feed into the machine of machine of things we've already talked about. Exactly my point but I was trying to go AI has a lot of opportunities we have to be AI 100 government agencies by 2027 must adopt AI right because it's so strategic. And by the way we must train a model that adheres to Swedish values and the Swedish language what is your evaluation set that you are running on today's technology that is failing? Okay why why do you need what I mean just rather than saying it because you're looking at it from the perspective of the opportunities but what are the actual problems you're trying to solve for yeah and and this is it could it be a better episode to paint a canvas on the pros and cons with the AI strategy around the fundamental eval view and how to reign it back.

Henrik Göthberg

So instead of so so so the AI strategy is following the trajectory of the AI industries outwards looking which is fantastic great but we need to make fundamental choices if we decide that we all know we don't have enough money to do to do all plays in here we need to be very stringent on where do we put our resources now so so then this uh what do you call it vector um assurance vector assurance vector is missing in this document. Yeah I mean assurance vector is missing but the I mean even before assurance vector what are we actually trying to achieve what are the use cases that we are trying to support that we have not had the company in the world already spend a huge amount of electricity and natural resources to train already what are the use cases and and now the come of course the tricky point becomes how do you balance this it becomes a national strategy versus going into something that doesn't belong on that level so I I I I don't think it's an easy job but I think it's the chat the it's the challenge to also pinpoint what are the key inertian blockers. So I think we have a bigger uh invention absorption capacity. Do we have a technology problem or do we have something else? So where is the focus of investment and focus on on being great?

Jevons’ Law And On-Demand Software

Magnus Hyttsten

Yeah I mean Swedish values the models should be trained to understand and reflect Swedish nuances etc or very fine. So we should be able to train models okay we should have we had we do have the MIMER infrastructure in Lean Choping to do this. So it's not a problem right now can we talk for one second about the effort involved in training one of these LLMs right and what what are we actually trying to achieve here because you cannot just simply say train models we can do context engineering that's a different thing. But to train a model that from scratch uh at the level of Deep Seek or Gemini or um GPT right I I did the calculation okay so there is something now called the GB200 from Nvidia 300 the GB300 exists as well right it's a superpod. You can place it in a vertical thing you connect the the the and then you can place eight vertical things next to each other it all needs to be liquid cooled okay because it consumes so much energy and heat that you have to rebuild the data center to be liquid cooled. Now you place one of these animals into production and the the baseline cost of one of these units is around um 300 um 300 million pronas is that correct I think so uh let me check on that but okay now how long we've we've we now we have this hardware right now we're gonna train it so how long would it take for this thing to be able to come up to the training that the the the the state of the art models have it takes 200 days approximately 186 so that's half a year and that's from start to scratch of the one successful run okay discounting all the work you did to trigger that specific successful run in the end. So you have to first of all 180 days you forget about it.

Henrik Göthberg

I mean half a year from now we're gonna be stuck in in the hitchhider sky from galaxy right 183 days from now we're gonna look at the system and say what did we ask it to do in the first place right and it says 42 because so it becomes imperative that we are getting from this ambition level to more fundamental choices in the engineering space but and and and and you know another piece of this strategy is that it's it's it's focusing on this as a tech problem. My understanding when I go to enterprise is that we have an organizational problem that we are not software companies that are organized according to a a a data AI first approach. And there is there is very little around in this AI strategy that that looks at invention absorption capacity.

Magnus Hyttsten

But even before that what are you actually trying to achieve what is the utopia situation that you would have if you take a hundred government agencies what are the eval sets you would run for each of them to say oh now we are more optimized we are more efficient or we can do it a bit more also in in not too negative a context here I think the rabbit hole I like to see the problems here of course I think you know one way to phrase what you're saying is we should have a more value driven approach.

Anders Arpteg

What are we trying to achieve what are we going how are we going to evaluate what value it actually would achieve if we were to do something and I think the strategy for one it's focused a lot on on research parts and a lot of we need to build an LLM uh the tech part and not on the value part and I think you know the whole shift from actually having a value centric and more engineering focused mindset here in what the product should be and what it actually should bring as a value is something that is missing completely today. Yes and and this was also already in the commission it was clear that it was uh ticking all the boxes in terms of what how we need to improve as a research uh company as a society or uh or country but but engineering if you look at what what do we need to have the engineering capability and what are the tools that engineers need in order to succeed in in public sector so we have a huge gap in in understanding that part yeah I Kaisa Noreen she's uh she she works she's been here yeah yeah yeah and and she said she put that word possibilities möjlighet yeah you see possibilities all the time right yes and of course yes I want to be able to roll it out because all of these possibilities but again that's not what you're looking after you're looking after what is the concrete value as a developer are the use cases that would make it so it's from the art of possible to do I have a research background as well so there are points and values with doing research which is more fine you know finding possibilities but there are also a lot of values from engineering where you really want to find the value and then you go back from the value point and see what you need to do not the opposite way we need both and I think what tech companies in the valley and Google that you've been at is really successful as is as of course they have research but they have a tons more engineering in building the products as well. Yeah yeah and that's missing I think in Europe and Sweden.

Henrik Göthberg

And we had Serke Johnson who's uh heading up uh AI in Rice the center of excellence and I think he said it so well when he was comparing Silicon Valley and Sweden how they had the tech giants spawning a huge cater of engineers who then goes out and has ideas and and brings and and breeds engineering superiority through the whole ecosystem and where basically it it that's sort of oh that's the key muscle we need to get to in Sweden. But still I'm very glad that we have a strategy so we are uh we shouldn't be negative.

Anders Arpteg

I think we also need to make the point that from a politician point of view talking about AI is very problematic. Yes because it's very few points they can make from an election point of view and the general public is usually negative when it comes to AI because it will take their jobs and it will abuse in this and that way. It's a hard so it's tough for a politician to actually go out and actually make this kind of a push which I so I really commend him for that.

Henrik Göthberg

Yeah totally totally so so it's it's so what we are saying here is not bashing what they are doing but is trying to strive for better better and sharper and sharper.

AGI Timelines, Limits, And Drift

Magnus Hyttsten

AI will be a strategic necessity for any nation um going forward and we have to adopt AI but we have to be careful taking the step to how to quickly before we have thoroughly at least had a roundtable discussion of what do we want to try to achieve. But you're right I mean it is a top down and and I guess that is the point it is a parallel track right all the activities that you do um if you lay them out as as as things if you have to connect them serially it's gonna be the sum of the time of all of those activities. But if there are things you can do in parallel yeah okay then you are in an opportunity. So whereas infrastructure and memor and all of the other things are very very important and the liquid cooled superpods the the for example one question is what are we doing with supervised fine-tuning data in Sweden?

Anders Arpteg

Where is our data factory not just taking not just taking textbooks I'm not talking about taking textbooks or factual books but like scale AI was for Facebook and OpenAI and all the other ones where is our national scale AI that is responsible for taking our private within Sweden data sets and making them available for AI it's not about taking books and feeding into the Corbett that does that's a great job I think here as well and there was a new actually there was an there was another news today uh Corb is uh is uh furthering developing um and they get more money to focus on the Swedish language models but there was a and we we need to move forward here I think you know we I still want to end on a positive note here a bit more please don't please and I think you know still we have a strategy now great I think we have an opportunity if we do take the engineering approach here and build some kind of reference architecture saying this is how we fine tune it this is how we adopt it this is how you find out if it actually provides value or not we have an amazing opportunity I think in Sweden sure to do so.

Magnus Hyttsten

No absolutely and I at least the way I'm coming at this is that I don't want to be negative from the perspective of of being um of of not be not committing to this in any shape or form or fashion. I just see that there's a lot of things we could optimize yes and be better as a as part and we're doing things I agree we're doing things that are good but there are a couple of parallel tracks that are not being prioritized.

Henrik Göthberg

And those parallel tracks they have their points of sequences of activities and this this this is the positive note then because as long as you stay on the highest uh abstraction level the the granularities which matters don't show up so what we are actually asking for is one level more of granularity or a parallel track yeah the parallel tracks for me would signal what is now in when we release a blind spot that needs to be spotlighted.

Louise Vanerell

But moving on to another topic I'm sorry I I just want to we're never going to get away with it no but this is I'm this might be a good okay okay and I know and you asked me before if I had a news that I wanted to bring but no idea. Ah okay oh yeah and uh this is sort of like an anti news because as far as I know uh the EU had their own deadline for a specification on article six which is the high risk classification part.

Henrik Göthberg

Okay.

Louise Vanerell

And uh so the anti news here is of course that they haven't so they've sort of went under the radar and haven't produced uh a more specific response to what sort of um products or services are gonna be um sorted under the highest uh risk so they have a a a law that is enacted but that without specifying really what it means and yeah yeah and that's the harmonizing standards kind of debate we've had for a long time and we really and we will get to the EU very shortly I'm sure okay so this was the huge rabbit hole on that AI strategy news and but in the end we are we we are looking at fine tuning a high level strategy into its core parallel track.

Utopia, Skynet, And Objective Control

Anders Arpteg

Speaking about fine tuning yes and we actually have another article that I think is small news I'll just take 30 seconds but uh China of course have amazing models we have the Minimax 2.5 recently released and we have the Deep Seeks and the Moonshot AI and so many more but this was an article that Anthropic has actually digged a bit deeper into who is using their model and they detected 24000 fake accounts in Anthropic's API that are extracting and scraping data from Anthropic and from Claude. So then they're basically distilling this is knowledge distillation it's a technique very commonly used and very useful to have which we should use in Sweden by the way but in a legal way they did this you know trying to get away under the radar with the fake accounts a large number of them and then from that training the minimax and mood shot and deep seek model. So you know of course we we see these kind of open source models that China is releasing and they are amazingly good and they're smaller and they're surprisingly efficient but we I think we are a bit sometimes overestimating the power that they have because it's much easier to do if you cheat and you distill.

Magnus Hyttsten

If they wouldn't have the access to the cloud models would they really achieve the results you don't think so well I mean there's a flip side to that not long ago we were discussing training LLMs on copyrighted material in the first place right so I mean just a year ago the big LLM vendors were using copyrighted material to create their LLMs.

Anders Arpteg

Now that's two different questions one is of course is it fair use to use copyright material? And I think according according to American law it usually is but still that's another question. The other point I'm trying to make is how difficult how big a moat is it from the tech giants to build these kind of frontier AI models to other smaller companies and I think the moat is bigger than people think.

Henrik Göthberg

Yes because they're cheating here a bit and then they can actually build much smaller models because they have high quality data they can train on by distilling it from anthropic and and this this this is this is super important topic because it it points in the direction what those parallel tracks and fine tunings that we need to do in Sweden should be all about. Yeah because if you if you come from China here but do it legally do it legally because the core topic here is if you come back to the deep seek and all that well oh it's amazing they could do this from nothing of course they couldn't so what you had as an example before on the fundamental heaviness of going into the foundational LLM in the beginning don't even go there if you don't have billions and billions. But but but but let me finish here now it comes like the the the proof is in the pudding they are cheating to make it cheaper and smarter but it's actually if if you if you do that legally they they you see the trajectory the strategy of fine tuning distilling from something huge and even then oh it's an American model blah blah blah forget about that. As long as we distill and fine tune in the right way we can overcome that and make something great in Sweden. But try to do it from scratch. No way.

Magnus Hyttsten

The question is also yeah so distillation is bad.

Anders Arpteg

I mean it can be used good ways Yeah, yeah, yeah. Of course. Sweden should do that. Sweden should do what China's doing, but legally.

Magnus Hyttsten

Totally. But distilling in this way is bad and illegal and you shouldn't be doing it. But by the same way.

Henrik Göthberg

They're not doing it uh openly, or they are trying to steal someone's IP. With 24,000 fraudulent accounts. Yeah, it's obvious.

Magnus Hyttsten

Or or they're taking Llama, okay, open source, and they steal from Lama that has been trained.

Anders Arpteg

They could do it, but it's not as good, of course.

Magnus Hyttsten

It's it may not be as good as 16 tropical.

Henrik Göthberg

But but if you if you then flip that to what's the legal way, the legal way is to do it openly and pay for the stuff, and it will be very expensive, but it will be way cheaper than trying to do the whole model from scratch.

Magnus Hyttsten

But by the same token, I would say this also it's not just the case that China may be leveraging in a not so good way, fraudulent way, leaning on American technology. One thing with China is that they have a data advantage that no other country in the world has because of the way that their government system is set up.

Speaker 1

Sure.

Magnus Hyttsten

I mean, there's a lot more potential to collect data for the.

Anders Arpteg

You still need to have the talent, the competence, and the techniques to do the training properly. And I think unless they copy American data.

Closing: Conscious Eval As Progress

Henrik Göthberg

But if you look at this now, and and let's not go into political systems and that, but if you want to copy the techniques, so the what the China techniques of having very good own data, distilling from someone else's model, and then creating your own magic on top of it. That that secret source is highly uh copy. We can do that in Sweden, in my opinion.

Magnus Hyttsten

Well, as as legally, of course. That would be amazing. I mean that would be amazing. Poor training to reach 10 race to the 25th level of flops, which is the systemic risk of EU AI directive, using one GB200 superpod or 300 superpod requires 183 days. We're never gonna be able to do that. But but this is feasible. This is feasible. So what we're talking about is uh finding the feasible path. Yes, but where are you going to distill from? And you need to have a model to distill from. Obviously, you're not gonna do it with the online services because that is violating the laws to begin with.

Anders Arpteg

But you can make an agreement with them.

Magnus Hyttsten

You can make an agreement with them potentially, or use their open source models totally, or you could use other open source models.

Henrik Göthberg

If if we would try to keep it within Europe, Europe, you would you would try to leverage Mistral in this case, right? Would that be good enough?

Anders Arpteg

Yeah, I mean they are not on the same level, but they're very close. They're the best in Europe for sure.

Henrik Göthberg

I mean, like so so because if we follow your strategy that is not about the art of possible, but it's on the art of practical, that is repetitive. Maybe Mistral is good enough in order to retract back to something that can be.

Anders Arpteg

But anyway, you know, we should find ways to fine-tune post-train and distill not train from scratch. I think we can conclude that that yeah, totally.

Henrik Göthberg

And here uh this is the segue back to the AI strategy. These are the kind of things that I would love to see in terms of choices on how we are gonna succeed. That makes sense. That would that might be in there, but it can't be read. Okay.

Magnus Hyttsten

It was a fun news section, but a 400 billion parameter model takes what 200 million dollars to train. Yeah, okay, from scratch. Yeah.

Anders Arpteg

Yeah. Okay, I shouldn't go to Sverg AI. Okay, let's continue now and let's move on.

Henrik Göthberg

Yeah, that was an exciting news. I got I got red uh rushed rosy cheeks.

Anders Arpteg

Oh, awesome. Okay, thank you for that. Um, should we do the AI Act? And I'm not sure really how to approach this topic uh properly now, but um let me have an attempt.

Henrik Göthberg

Okay, because I think when you are starting up your consultancy now, one of the angles is of course how to work within the AI Act uh arena because that's where the consulting is gonna happen. So, how how have you looked at AI Act and how have you looked at eval and where where the opportunity and the consultant and the product space is?

Magnus Hyttsten

Let me start not from the consultancy perspective, but the product perspective.

Henrik Göthberg

Perfect.

Magnus Hyttsten

Okay, you have 150 pages of legal text. Okay, that is pretty good. I mean, technically also pretty good. You have probably 10 to 14 different personas in this thing importer, distributor, provider, deployer, um, public authority, financial institution, size of your regular company enterprise, etc. You have a number of different classifications, prohibited, high risk, limited risk, systemic risk. You have two families of AI products, AI systems, as opposed to general models. Now, and the last thing you want to spend time on as a company is to read through all of these different permutations. Yes. What's the combinatorial effect of this? It's in the millions and figure out if this applies to me or not. So the way we look at it at assurance vector is the following: we want to take the responsibility based on information that you give to be able to distill down uh exactly what you need to do in order to comply with the AU AI activity. Depending on your category, your system, your category of all this. I'm a provider of an AI system that is doing a general chatbot in the flight.

Henrik Göthberg

We've talked about this business opportunity. This is obvious.

Magnus Hyttsten

So so if you're if you're if you're that kind of system, you're not a high-risk system, you're not prohibited.

Anders Arpteg

So Which is 90%.

Magnus Hyttsten

Yeah, that's 120 pages of the 144. So drop those to begin with. You need to answer or have good responses to the following questions. Do you have a solution for this, C me? Oh, yes, yes. We we have a minimum viable product right now that we built on, um we built on um with our own knowledge, and we're gonna look at series 8 something.

Henrik Göthberg

Because if you look at the combinatory opportunities of this, there's so many different ways. So if you want to navigate this, it's impossible. But if you if you push it into a model, it filters out the right combinations.

Magnus Hyttsten

Yeah. And then based on that, I mean, we can vouch for, to the best of our knowledge, assurance vector, that you have we can use LLM as a judge based on your answers, uh identify how good or specific they are, and how you comply to them, and we can say, Yeah, this looks good. Here is a certificate for you. You're now, I mean, according to the best of our knowledge, which we hope is the industry standard, you are compliant.

Anders Arpteg

There's no So you mean you can more or less do quant uh automatic compliance, you think, with us?

Magnus Hyttsten

Or so yes, it's a big part, is just clicking out to kind of home in on the context that is important to you. And the other one is to make sure that do you really have this and this and this and this and this? Because if if somebody calls you from the EU, you better have this and this and this. Right. If you have all of these things, and by the way, you can also describe what you have. We can analyze all of these different things and say you're in a good place.

Anders Arpteg

Do you help actually generating the things you need to have as well?

Magnus Hyttsten

Or I mean, obviously, that's that's part of the job, right? And that is, I mean, just looking at transparency, most of the systems are not gonna fall into prohibited or high risk. Nope. And most of the vendors obviously are not systemic risk GPAI implementations according to Article 51. So there's got probably gonna be 300,000 transparency systems out there.

Anders Arpteg

The one that can create an agent that does this properly.

Henrik Göthberg

That's yeah, because because because if you think about this, it's not uh how should I put it? It's not that hard. Because in the first step, you need to categorize where you are. In the second instance, you need to understand what documentation do I is required in relation to that.

Anders Arpteg

Yes and no, but I think there are problems here because they don't even have a way to know how it's supposed to be compliant. Right. Right. So it is a big problem. So you need then to add your knowledge.

Magnus Hyttsten

I've tried it myself. You feed the AUAA actors say create an evaluation suite.

Henrik Göthberg

Yeah.

Magnus Hyttsten

There is no AI system. Uh that's gonna create someone useful for it. Exactly. What do you mean? So human intelligence needs to go into this process, a very close eye to what the regulation is, where the regulation is going, what is important?

Henrik Göthberg

Any okay you decart in order to build your system now, you needed your eval background.

Anders Arpteg

And there this but not only that, because what uh she uh Eloise just mentioned, you know, for Article Six here, there is no way, there is no specification of what you need to do to make sure that you are compliant. So even if we can find the parts, you need to be compliant with this. But how? Yeah, we don't know definition.

Magnus Hyttsten

Yeah, and here is the thing on the one hand side, you want EU, you want us, EU taking protecting us as citizens here when AI systems are launched. On the other hand side, you have all the companies, yeah, right? EU wants everyone to be compliant. I mean, we're protecting our rights. The companies find it extremely cumbersome to start this 144-page description. So the purpose of assurance vector is to be able to bridge those environments and say this is what you need. And we talk to EU. I mean, when we have all of the committees and advisory forums in place, we talk to the EU to make sure that the best practices and what's going on in the legal system gets distilled down to you have a customer from us.

Anders Arpteg

I can tell you directly if you manage to do this. That would be amazing.

Henrik Göthberg

But uh, such an awesome the business idea is brilliant, right? And and uh uh but but it but it's not uh I agree. It's clear cut in principle, it's clear cut, but as long as things don't have harmonized standards or definitions, it's it's then you need to fill in the blanks.

Magnus Hyttsten

Yeah, and and here is the thing if you're a transparency vendor, you you don't need to have be officially registered at the EU database at all. But you need to have certain things in place. Yeah, AI literally see uh article four, and you need to have all of the transparency obligations in article 50. So what our job is then is get that information, get your certificate, and now take the uncertainty out of this process, sleep well at night.

Anders Arpteg

Awesome. And I guess for most use cases which are not falling on high risk or even medium risk, then it's rather easy to be compliant. And we perhaps know how to even do that if you just document it properly, etc. Right?

Magnus Hyttsten

Yeah, but you need to follow the regulation, any modification as a company, you're always living in this limbo space as something changed. What forces do I need to be connected to to understand?

Henrik Göthberg

The problem is this knowledge divide on how to tackle the problem and how to organize the problem. So when you have so it's a little bit, I say when when someone has done it and you show it, it look it makes total sense. But when you sit there with a blank piece of paper and don't know which uh angle to read first or how to read it, it it is not so easy. No. But it but it but but but if someone has done the groundwork and you then and then I look at it, yeah, that makes sense.

Anders Arpteg

I think you wrote also somewhere, I'm not sure when, but I think you said something, Magnus, about um the AI Act uh actually could be a god rail for innovation, not hampering it. Yeah. What do you mean by that?

Magnus Hyttsten

Yeah, I mean it's putting down I agree with you. Yeah, yeah, yeah. No, it's putting down the structure, eval, if you want, the playbook, by and it's gonna be an iterative process. All work we do with machine learning and artificial intelligence is iterative work, much more so than any time before in in computer science engineering. It's putting down all of the cornerstones of what you need to think about, what is important, what is not important. So that's that's why it's kind of creating iteratively over time our definition for what is okay to do and what is not okay to do in this role for this kind of system.

Anders Arpteg

Um so if you have to go through the evaluation or compliance path, it could actually open up some ideas. This is actually what you can do in that way also you know increase innovation.

Magnus Hyttsten

Right? Totally, absolutely. And and and you get to think about well, first of all, you get to think about the opportunities. Uh you you it drives a roadmap, right? Yeah, you see the high-risk systems here. Uh, can we avoid being a high risk? Is this particular feature very important? I mean, you saw Meta's controversial articles now regarding the glasses.

Henrik Göthberg

Now we didn't talk about news as well.

Magnus Hyttsten

And and um, and and I mean uh clearly some of those categories that they're talking about they fall into the prohibited or the high risk.

Anders Arpteg

Perhaps we can integrate the discussion about the metaglasses as a good example of what the AI Act is really saying about this. And and perhaps you can elaborate a bit more, Mangus, here, but of course the glasses then is potentially surveillance in the world.

Magnus Hyttsten

Well, so I mean, doing general public biometric identification, for example, face ID, is a prohibited practice. Yes. So according to EU AI Act, you cannot launch a product like this.

Anders Arpteg

It's not even high risk, it's prohibited. It's prohibited, yes. But still, we have now a product which can do that, right?

Magnus Hyttsten

So there's going to be this discussion, I think. And and Meta actually, I don't know if we should be talking about this, but there was an internal memo that was leaked by Meta reported by reputable organizations where there was talk about we have to do these things now to get them onto the market before legislation actually picks up, right? So, and I mean that's not a surprising thing for a company that wants to launch a product, obviously.

Anders Arpteg

But but just to give some more details about what the glasses can do, and and please correct me, I'm not fully read up on this, so I'm not sure what the functionality is. But to my understanding, the current version of the Ray-Ban glasses from Meta uh they have some experimental facial recognition feature, but it's not in a product yet, right?

Magnus Hyttsten

I I know I don't think it's there, but there's been reports of I mean recording leakage of very private situations that that where people have not been aware of that the the classes the data has been sent off.

Anders Arpteg

Well they certainly have cameras and you can like ask it questions through voice and saying what this kind of search is that, and then it can do answer and give some text in some heads-up display that they have.

Henrik Göthberg

Yeah, and and and and and what we're talking about. Uh I think technically the functionality is that you are seeing someone in a party or in a in in a networking environment, and you get the the biostats of who who is in front of you, and what emotional state they are in if you walk into a meeting.

Magnus Hyttsten

I mean stress the product is awesome, it's so much fun, right? But it's also so extremely scary.

Anders Arpteg

Yeah. Yeah, but uh privacy implication is of course horrible there.

Henrik Göthberg

I want to test with a little bit where you were going in the trajectory of uh how regulation is actually part of the innovation process in a good way. And I think it's a if you simply this is back to what I talked about a little bit before, if you simply flip regulation is about creating risk consciousness. So, what we are talking about that is more or less in any system you build, if if we take the iceberg metaphor, we have the innovation in terms of value that is the visible part, and then there are the all the things that can go wrong or that can screw up a system, which is the uh you know, under the surface risks. So simply by looking at value vectors as risk vectors, you have a much more conscious understanding of what system you need to build. So, one example, we we worked with um you know how to better understand what to test and evaluate with the TEFEN in AI Act uh in Rice. We started to define what are the fundamental risk vectors of an AI compound system. So, you know, okay, you have the model risk vector, you have the data risk vector, you have the process and governance risk vector, which is essentially then, if I flip it, the value side of a product launch. So if you want to build a strong product, you need to be good at all these things and you need to take away the friction and all these dimensions, otherwise your product will fail. So ultimately, safe and smooth are us two sides of the same coin. And therefore, to have a risk-conscious view on innovation is fantastic. It allows you to build a stronger process. Do you see? I mean, like that was my rant.

Magnus Hyttsten

Yeah, yeah, yeah. No, I totally see that. And why would anyone be skeptical against the framework that's trying to protect us? And the what what how is it trying to protect us? It's trying to say that if you build a system like this and you violate these laws, we are going to uh get money from you from the perspective of percentage or your turnover or whatever, kept at 35 million. Um, I won't say the exact number. So why would we be hesitant towards that in the first case?

Anders Arpteg

The intentions is great. I think no one is really hope arguing against the intentions of the EU AI Act, right?

Magnus Hyttsten

But I I would again say the EU AI Act is not heavy for transparency. No, it's it's difficult to understand all the paths, but it's not documentation heavy for transparency. For high-risk systems, right, there's a lot of work to be done. But then again, this is like allowing cars to drive in 300 kilometers on on the on on your road without having road signs. So it's all regulation.

Henrik Göthberg

You can so it's if you think about it, it's like the the what they classify as heavy or high security, all this. There is a reason we want that to be more heavy to do, because you should not play around in that category if you don't fucking know what you're doing.

Magnus Hyttsten

And by the way, if you're doing research development, you're just playing around, you're doing military uh applications or certain law enforcement applications, or you're just privately fooling around with the system, you are not, you don't have to comply with everything anything. No, you're completely out of scope.

Henrik Göthberg

So so I think as soon as we flip this into a risk consciousness conversation, and then I think we have concluded, and I I stand by it, we don't have a regulation problem, we have a legal uncertainty and harmonized standard problem. Yes. So it's the legal uncertainty and harmonized standards and the model through that fucks everyone up, not the actual regulation.

Magnus Hyttsten

I would agree.

Anders Arpteg

Cool.

Henrik Göthberg

Um and we can leave this topic now.

Anders Arpteg

I think we've spoken too much about it. But cool. Uh so that was a good uh AI act topic, by the way. One of the best and I think you know, if you succeed, Mangus, with um the product you're speaking about, that would be a killer app that would be as a killer app, usually okay. Um if we go, I mean, you've been working with compilers a lot on XLA and uh these kind of things and TensorFlow, etc., in the past, and um you I think you mentioned sometimes that uh everything you know potentially comes back to code in some way. Um but now you know we're moving into these kind of more agentic kind of environments, and code is perhaps becoming more and more, you know, less something that humans deal with, and we are perhaps even starting to work more with the behavior of agents and then eval sets, you know, which are the tests. What do you think you know source code will mean in the future?

Magnus Hyttsten

Um Andre Karpathi had the perfect quote to this, right? Chat GPT was launched when? In October 2022. There is a Twitter feed from Andre Karpathy, um, who was the the engineering chief, AI chief at Tesla, fantastic educator. I think it's dated 21st of January something. English is the hottest new programming language. And I don't think he understood at that point how I mean he he he understood, but he didn't understand. I hope I'm fair saying that. Um, and it's true. I mean, you look at coding today. Will programmers be replaced? I use um the coding assistance all the time. I do not write code anymore. Do you look at the code? I do sometimes. I sometimes I do not, um, but sometimes I do. Um, I would say this the only value I feel is I know what questions to ask. I know how to bring it all together. Will that change in the future? Maybe, perhaps. But to what extent? Uh at the other extreme level, are we gonna be sitting here allowing Some computer system out there do whatever it wants, then what is what is the connection between what I need and what I want and these autonomous systems out there building something I'm not even aware of? You see what I'm saying? Yeah, fully. I I need to be attached somewhere in this value chain, even though AI per se is taking on some of my responsibilities. And I think we're gonna get into the kind of human what does this mean for humankind going on?

Anders Arpteg

We could go there, but but before that, you know, we can also think about you know, if AI models now are generating the code more and more, and humans less and less, perhaps we're not even letting AI write the code, perhaps it just does machine code directly in the future.

Magnus Hyttsten

I mean, there are there's a research paper where a lot of uh research uh scientists have gone together and say, let's adopt the standard. Okay, when we train thinking models, let us decide that they always are going to be thinking out loud in English.

Henrik Göthberg

But but uh I have a take on what you said now, and I think uh I'm I'm paraphrasing several people. Uh on the one hand, Oliver Molander, I think he wrote uh it quite brilliantly. So, one way of looking at the new coding paradigm, and is a little bit like if we lift our game and to talk about what is a great software engineer or a great systems engineer, and we look at that as a pie diagram where we have uh 10 different facets of what great software programming is all about. From ideation, system design, understanding the customer, navigating, communicating with the team, the delegating, and all that kind of stuff. All of a sudden, now what happens now is that the core identity of a great programmer has also always been related to one of these pies coding. And and actually what it does, it emphasizes that the great software engineer is in the five other pie parts of the pie. And and therefore the identity of a software engineer or software programmer has not changed. Uh it's equally important, but it's used that your identity is not in that particular slice of the pie. I and I found that kind of metaphor quite uh telling. So we need software engineers for sure, but what they do and what they excel at is all this soft stuff that you need beyond the coding.

Magnus Hyttsten

Yeah. Do you agree? I well, yes, I agree. But then there is the question of how far can AI go. Yeah, good. Because it's moving fast, right?

Speaker 1

Yeah, yeah.

Magnus Hyttsten

English is the new hottest programming language again. I mean, so programmers use English, and that's fine.

Anders Arpteg

Uh, but I mean I think it for one, you know, AI is enabling people that couldn't code to do programming in the future. Sure. So it will broaden the possibilities for people to build things for people that couldn't do it at all in the past. So that's nice. I think there will still be a lot of opportunities for specialists. Someone that is an expert in how to do things on an edge device or whatever it could be, but they still, of course, use AI to do it. It's just that they have specialized skills, just as they do today. It's just on a higher level of attraction. I agree.

Magnus Hyttsten

And at the very highest level, if AI does everything else, I think there is a human still there saying what do we want to try to achieve?

Anders Arpteg

Yes.

Magnus Hyttsten

Defining more value center. What is what is what is important in our lives? This is a tool, yeah, but I'm not gonna be satisfied with just a data center running agents over there unless I have decided what the goal of all of this is.

Henrik Göthberg

Yeah, so it's the goal setting and the delegation, but then I also think as if we move if you take the abstraction metaphor, it will be quite advanced complex adaptive systems when we're starting stitching things together. So if we look at uh I mean, like you don't work in the machine code, or you you're not dedicating memory, so you're working on a higher structural level, and coding today is already stitching things together. And imagine now that one person can stitch together a whole ERP system on their own, it's essentially the same thing. And of course, so I understand so. Of course, if we're not gonna need maybe thousands of software engineers, but but the skill it could be also that we simply can do so much more.

Anders Arpteg

I think there are so many things that we do not do today because we don't have the productivity to do it. So it doesn't necessarily mean that we will have less engineers. I think we will have more.

Henrik Göthberg

Yeah, and this is Jevon's law. Have you heard about Jevon's law? So Jevon's law dictates that when the cost of doing something goes down uh exponential like substantially, it opens up new opportunity surface areas, and we would start coding and doing things that we could never dream about that we would have taken in. So throughout history, Jevon's law has prevailed, it has increased the need for engineering simply because now there's more things to be engineered.

Anders Arpteg

Even to the extent, if I just quickly I think Sam Alton said something interesting once, which was for two years ago. He said, I think in the future we will have on-demand software. What he meant with that was that uh if I think right now I need to have some way to bake this pie, then I simply tell my AI, build me a software to do this, and it builds it at runtime at right now. You you execute it and then you throw it away. And the same thing can be for right. I mean, imagine the number of software you know application we were building and that's throwing away. Yeah. So then it's not hard to imagine that we could actually be needing a lot, a lot of software engineering in the future.

Louise Vanerell

But I mean, I uh you've had me on the podcast before, so you know this. I love to draw my historic card. Um and I think it's always so fascinating to think about uh we see this as this one one in a lifetime shift. And it's definitely not if we look through the history of mankind, uh the the the only shift that we've had on this on this sort of uh scale. And people, the human, the biology of human is that we are put on this earth to uh solve problems. And if we don't have problems, we will find new problems to solve. Yes, we are.

Henrik Göthberg

I think so too. Very true.

Louise Vanerell

And and I mean it only provides, just like you were saying, Anders that of course we're gonna find new stuff that we need to solve that we haven't had time to before.

Magnus Hyttsten

And I think that kind of wraps into the the final discussion as these systems become more intelligent. We always have the fear everything will go bad, right? But let's finish on a on a on a positive note here. We need to be able to nourish that curiosity for in invention that these systems provide.

Henrik Göthberg

Yeah.

Magnus Hyttsten

And I think that is a very, very important strategic direction we have to work at. And it actually requires a lot of work because the alternative approach is the kind of wally approach, right? Where we just get pizza served uh out of an oven. We don't know how to make pizza, uh, but we just watch TV and eat pizza. We have to keep the curiosity and the willingness to innovate.

Henrik Göthberg

That's the importantness of how we want this world to be as active choices in the small things and the big things.

Magnus Hyttsten

Yes, and in that case, I think AI becomes a tool rather than something that controls us.

Anders Arpteg

Okay, that was a great segue potentially to uh to the final question here. And um we usually end these kind of podcasts with a very philosophical question to you, Magnus. And if we assume that we actually have AGI in a few years, do you have any thoughts about you know when potentially we'll have AGI?

Magnus Hyttsten

So I was very, very um, I had a very good firm definition of AGI six years ago. It has completely evaporated. What does it mean to be AGI? Yeah, um, I mean, we have AI systems today that are more skilled at particular things than humans are for sure. I mean, we've we've surpassed human capacity on many, many different dimensions. So I I I I I I really don't want to even talk about uh AGI.

Anders Arpteg

But but if we still, you know, of course, playing chess or doing calculations or even knowledge management, I would argue, you know, AI is today extremely much better than any human. But still, there are other things that uh humans do better, uh, of course, like in yeah, so we can go there, but then potentially if we use um let's say we use some ultimate definitions of gain on an AGI, and he basically said something about when we have an AI system that can be on par with an average level human coworker, then we potentially have AGI. And I think actually we're far from that still. Because you if you need to take and actually do all the things that Henrik is doing, uh it's not average by the way, but just saying I'm way below average. But uh still, you know, taking and saying, okay, I tomorrow I throw out Henrik and I use uh Gemini to do it, it will fail. It will not be able to do at all what Henrik can do. And and I just you know challenging anyone to try it, and you will see, right? Yeah.

Magnus Hyttsten

No, I I mean I would probably agree that Henrik is unique from that personality.

Henrik Göthberg

And and and then we have the then we have the sucking up, like okay, to uh to be the equivalent of a human person in front of a digital screen, and then we can take it even further to be the equivalent of a human being with physical dexterity, and and and and then we can take it all the way AGI physical dexterity in the lab versus uh out in abundance around us, and then of course there there are many, many hurdles which is uh but at least if we use it in AGI as a definition of being able to do general things, yeah.

Anders Arpteg

We we certainly have specialized intelligence today, no question about that, right? But if we think we have general at some point in the future, I mean, how far away are we?

Magnus Hyttsten

I mean, Chat GPT was relaunched launched November 2022. When I sit with Gemini or any other systems today, I take a picture asking it what kind of TV should I buy in order for the angle to be perfect, and how far away from the wall should it go so I can tilt it without hitting it.

Anders Arpteg

But still, I can think you know when coding today, you can see the reasoning is not even there in many cases. So, I mean it's really good in knowledge management, I would argue, but in so many cases I can I know in advance this will fail.

Magnus Hyttsten

The star programmer of Spotify said that I haven't written a line of code since November.

Anders Arpteg

But they are working and telling the AI what to do all the time. So it's not about programming, it's not about writing lines of code anymore, right? It's really about deciding what it should do. So you tell the agent to do it, but you're still in control and you still have to tell it that was wrong, this is right.

Magnus Hyttsten

Well, yeah, but what is the opposite? When you don't tell it and it makes decisions on its own, and what does it make decisions based on then? What objective?

Anders Arpteg

Yeah, but and we're not there yet, right? Yeah, and I can very easily see, even for the use case where AI really shines today, which is coding, and then if you go invoice management or marketing or whatnot, it will it's even worse. But encoding is really good, but I can still see very easily, I know this will be something that AI cannot do, right? Sure. So so we there are clear use cases where AI, especially when it comes to higher level of reasoning, that I would argue that AI is really bad at, and then taking actions is still surprisingly bad. And and there are ways to go, but we are getting closer.

Magnus Hyttsten

I I would say this in 2022, if I said what was here now, like I the progress is amazing, of course. Like Isaac Asimov, hello future people, he does on the video, right? I I would be completely baffled. Yes. And when chat GPT that moment hit, and I thought the Turing test is that's done. I mean, this is just it's done.

Henrik Göthberg

Yeah, so we are we are moving the goalposts, and I mean, like, and and and we had the fortune of having Kareem here from you know who talks about real-world AI and the fundamental issue and problems of reasoning and learning uh in the real world, which is not actually built into the LLM as it is today.

Magnus Hyttsten

But my my take is this if I look at AGI, it's almost like this. I put the a bit of godlike nature into it. It's like it reaches singularity and it uh it improves itself iteratively uh through recursions, and finally it says something that we cannot even understand because it's so intelligent and it's playing on another.

Anders Arpteg

But then it's AI, right? Artificial superintelligence. Um but if we just think AGI, you know, and perhaps that can come about rather quickly.

Magnus Hyttsten

And it sounds like you think I yeah, I I I I I would never have envisioned not coding a single line for for a month or two.

Anders Arpteg

But it's interesting because you still code by telling up.

Magnus Hyttsten

Oh, yeah, yeah, yeah, of course. But I I am the objective, I am defining the objective, and that's getting back to what I talked about before. Yeah, I want to be defining that objective, otherwise you'll have these millions of autonomous agents doing what I don't know what they're doing, what kind of thing is.

Henrik Göthberg

This is a scary, more scary idea. But but Andresh, you are still we we we you've been zooming in on the Kerch file 2029 for for a long time. And and you are in one way now bashing yourself that we are further away from AGI. At the same time, we see the progress.

Anders Arpteg

I'm still at 29.

Henrik Göthberg

So you you are consistently at 29 for five years, I think. Yeah. And and uh okay, but but that leads into the okay. Do we if I flip it, do we think it will happen? For me, it's inevitable that it will happen, but we can argue on on you know if there, you know, how it how the how fast and we will get there. But but to get to a point where we can live up to the Sam Altman definition, I I think it's not a I think it's I think it's I think we are a couple of progresses away. I'm not sure LLM in its own is there, but I still think that we will solve these things.

Magnus Hyttsten

But what what what have we what have we achieved if we have this AGI that we cannot understand that is more intelligent ever than we were? It's like the movie her. The AGI goes off, and I don't want to deal with you anymore.

Anders Arpteg

Let's go to that question then. You know, let's say that we have AGI and even ASI, then super intelligent, like a thousand times more intelligent. We have no way to even understand what it is speaking about, unless it's wants us to understand. And then you can think about one extreme, of course, is that we will have the dystopian future, the matrix, the terminator, and the machine. The wooly the wooly and machines trying to kill all the humans, and that could be rather bad. And then we go to the other extreme, the utopian version, and what Nick Boostrom wrote about in Deep Utopia, and could be a world where AI helps to cure cancer and fix the energy crisis and fix the climate change problems, and basically makes the world to you know towards this kind of world of abundance that Ilamas calls it, where the cost of products and services go to zero. Yeah, right. And and then you don't may may not need to work 40 hours a week anymore.

Magnus Hyttsten

Sure. And the important thing, the the the Skynet thing could could well happen. I mean, it all depends on whether we have control of the objective function. Yes. And actually, going back to Isaac Asimov's three laws of robotics, yeah, right? Um, and even those are complicated in prioritization when you talk about humanity versus the life of a single person, uh these robots get in conflict.

Anders Arpteg

But what do you think, Mamus? Do we get closer to the dystopian or utopian future?

Magnus Hyttsten

I think I I do think, I mean, Kurtzweil, I I'm he works at Google. I I am an admirer of the singularity, the bootstrapping kind of thing. Do I want to have it? That's the thing. When I when I all my life I wanted to have this, now I'm like, I probably don't want to have this. I don't want to have a super intelligent machine that I'm supposed to worship because it's so much more intelligent than I am.

Henrik Göthberg

Um but what where do you end up on the spectrum? If you speculate, it's a useful where do you think we're going to be able to do that?

Magnus Hyttsten

I think there will be a lot that I think there will be a lot of changes to society because of this amazing tool that we have now created. Um I think we're not talking about, I hope we're not talking about a Skynet scenario. We got to keep our eyes on the objective function. Let these machines continue to speak English, please. Because again, the objective function can change over time. And as we said, 0.99% certainty raised to the power of 10 just gives 40% success rate. In that case, we're in real trouble, even though we have 99%. So a drift in the objective function is extremely difficult. Now, what's probably I like Joshua Bandio. Uh, his his what keeps me up at night is somebody using these tools to create a virus with incubation time of one week that is airborne, that's gonna be bad for all of us.

Henrik Göthberg

Yeah, but that's also what you said is like uh Andres has said many times, I'm more scared about the journey towards AGI when some malicious person or someone human are using AI badly and fucks it up on the way.

Magnus Hyttsten

And isn't that a much more likable uh likely scenario like when it comes to human harm?

Henrik Göthberg

So the human harm will come from humans doing fucked up things or unintended consequences. Let's hope we get there. Yeah, I think but it's back to us humans consciously taking the right steps each step of the way.

Anders Arpteg

And I think also it's a lot about what you're trying to do here, Magnus, which is for one, make sure that we have a way to do evaluations properly, yeah, to know what the objectives we're trying to achieve is. Yes, and not the least to also perhaps use legislation as a way to improve innovation, yeah. Right.

Henrik Göthberg

So I think you know, if you just deliver on what you said, I will love to see what my end note is that the evil conversation today is uh a very tangible way to explain how to consciously do good. Which I is amazing.

Magnus Hyttsten

Yeah, and uh and that's the good part of the story. It's actually we're we like these systems, we don't want to limit them, no, but we want to walk hand in hand with the evolution of this.

Henrik Göthberg

Which is a huge difference to someone who is an AI doomer and who just wants to sure sh shut down the whole conversation. You are saying no, no, no, we need to consciously do good, we need to eval good on the way.

Anders Arpteg

Yeah, love it. Yeah, please continue the great work. I'm really uh happy to have you have had you here and and hear heard all about these kind of amazing uh discussions and um yeah, and good luck with that assurance. Obviously, needs opportunities.

Henrik Göthberg

If we get that assurance vector, man.

Anders Arpteg

Thank you so much, Magnus Hudson. It's been a true pleasure to have you here. Thank you so much. Thank you so much. Thanks.