AIAW Podcast

AI News v4 2024 - AlphaGeometry, Solar 10.7B, Exphormers

Hyperight

Unveil the latest breakthroughs in AI with AIAW News' fresh release, AI News v49 2023! This edition brings you DeepMind's AlphaGeometry, an AI marvel solving Olympiad-level geometry problems with unprecedented proficiency. Get insights into UpStage's Solar 10.7B, a pioneering system transforming large language models for single-turn conversations. Witness Google's Exphormer, a machine learning innovation, tackling the scalability challenges of graph transformers and revolutionizing the analysis of graph-structured data. These trailblazing developments are reshaping the landscape of AI, from mathematical reasoning to advanced neural network applications. Tune in to the AIAW Podcast for a deep dive into these fascinating advancements at www.aiawpodcast.com or stay updated via Twitter @aiawpodcast.

Follow us on youtube: https://www.youtube.com/@aiawpodcast

Speaker 1:

It's time for AI News brought to you by AI AW podcast.

Speaker 2:

So we have added this kind of section in the middle of the podcast where we just had a short break speaking about some personal favorites, news items that happened in last couple of weeks, and each of one of us can choose to bring up a topic. If you have any, if you don't feel free to ignore, but if you have, try to summarize it in three, four minutes, if you can. Anyone else wants to go first?

Speaker 3:

Holy crap, I can go.

Speaker 2:

Yes, let's go Super excited.

Speaker 3:

No, I just came back from South Korea and I met with a lot of really interesting people, Primarily two things One in the ability to train models on visual input video and such, which is super cool, but one thing that really blowed my mind is the small language models.

Speaker 2:

Yeah also.

Speaker 3:

I never tried it and I couldn't believe it. This is like the equivalent of me connecting to internet. I was shocked. I downloaded it. I can't get it to run here in Europe, but I run it in Korea for like a couple of weeks. I downloaded it. Just help me with my writing like real time on anything within my operating system. It's from the upstage team.

Speaker 1:

So the same one that built.

Speaker 3:

Solar Model.

Speaker 1:

So the idea with the small transformer is actually I can put it inside my laptop and I can write in real time.

Speaker 3:

It runs all the time and whenever you write something, it's there.

Speaker 2:

In real time, more or less yeah.

Speaker 1:

This is for me how it should work. No but flipping back and forth.

Speaker 3:

Yeah, but I was. I didn't want to believe it. I had to even go turn off my Wi-Fi, even check that it didn't sneak out on my Wi-Fi in some weird way. It's like I didn't want to believe it because it was so magical, it was insane.

Speaker 2:

Was it like auto completing or was it really like writing so?

Speaker 3:

I started to. I used it in different ways, but the classical kind of you start writing something and then that little icon come up and then boom, and then you have like formal, informal. You can like auto prompt it in some way shorter, longer, use emojis, use not emojis meaning that it takes whatever you write and then give you a suggestion how you can actually write it better.

Speaker 3:

Out of the existing text Of the text that I started to write, so it has a little bit of an input and on that input it's like auto completed ish and styled it up into formal, informal and these kinds of things. So think of to like Grammarly, but real time in everything, everywhere. And now, like I don't know. I wrote in my predictions for 2024 about small models and how excited I was, I did as well, actually.

Speaker 3:

And then I was trying it and I couldn't even predict how excited I was. Sorry, I just get all like a kid here. The organ is going to come back, the vacuum organ.

Speaker 2:

I think that's one of the big trends for 2024, that we will have more efficient models. That is going to democratize, if you can abuse that word a bit to basically have any influence.

Speaker 1:

And this is all that Gemini did the storyline with the huge, the normal and the small, the nano one, the nano one they call it. It makes sense.

Speaker 3:

I run it on my old M1 MacBook, so it's not of the super new ones. I mean, this is several years, I don't even know how old it is, but it works like a charm and a meaning that's going to be in every single thing more or less so.

Speaker 1:

all of a sudden now you get to embed a language model inside a product or something.

Speaker 3:

Equally as big as internet, I would argue, and I think now it's much more. Yeah, I'm trying to be nice here, but internet there's people here in internet watching us.

Speaker 1:

Okay, so that's good news.

Speaker 2:

Alex, do you have anything you want to bring up?

Speaker 3:

No, okay, I was thinking we're going to go back to that topic, but bring your parents.

Speaker 1:

Actually, that was the news. That was the news topic that we can now bring into this, because, if we're cutting it, there was an Instagram. There was a storyline around recruitment in US. Tell us the news, because this is news, because it's.

Speaker 3:

Again, it's enough. Now he has forgotten. Go ahead and we'll come back in.

Speaker 2:

Do you want to go next? Sure, a lot of stuff happening, and it was actually a number of weeks since we had the last podcast. Now with the Christmas holidays and whatnot, so it's hard to choose what to really pick up on, but the one I chose is actually something that I have mixed feelings about. It's called Alpha Geometry. It's from DeepMind.

Speaker 2:

And it combines these kind of traditional rule engines or symbolic engines with more language models. So in this case it's in a competition, a math competition called the International Math Olympiad, and it basically become as good as the gold medalists or between the silver and gold medalists of the best humans ever competing in this kind of math Olympiad, and it's in geometry kind of math but of course, super impressive. What's new really in this one is it's not really the combination of having both rules or symbolic systems combined with neuro, neuro and network kind of solutions. That's been done a lot in this kind of hybrid system. But what's new is they use language models to do the creative part.

Speaker 2:

So when they describe a bit what humans do is okay, they can use productive rules to try to see how can I prove that this kind of geometry has X and Y property and they know the rules. The problem is that the thing that humans really good at is coming up with rabbits, as they call it. So the rabbit is this kind of weird thing. What if we add this kind of variable to?

Speaker 2:

the equation and it comes from nothing. It doesn't really make sense. You just have some kind of creative inspiration of what happens if we do this, and that is really a hard thing to do, especially for a computer, because it doesn't have any sense on. Why should you add this specific thing? So some kind of intuition is guiding humans in finding solutions for these kind of math problems. Now, what they were able to do better than anyone before was use large language models to do the creative part of pulling the rabbit out of the hut. I would say that they are basically taking the problem that a lot of people see with large language models, which is hallucination, that they just make up things that looks good, but actually using that as a feature. So suddenly the ability for large language models to make up stuff becomes useful because it's still related. It's based on some kind of intuition that you can't really describe. It's just that.

Speaker 2:

okay, let's add this in the language of math in this case and suddenly they first try to do this just with rules. They may fail finding a solution. Then they ask the language model please add a construct or a new rabbit. Then they try again and it may not work. And you try second and the third time. Then, by adding a number of constructs, suddenly it finds a solution, it solves the problem, it finds the proof. And this kind of combination of using neural networks, large language models, to be the creative, hallucinating part of humans, combined with the very deductive logical rule engine, is very fascinating. So this kind of creative part of being able to come up with strange things that you don't really have a reason for, intuition in some way and you can call it hallucination if you want it's turning out to be really useful.

Speaker 1:

And what is the usefulness? Is it creating new ideas on this? Why is this useful?

Speaker 2:

They just use it to show that the AI model is as good as humans, more or less, but of course in the future In creative thought, in finding proofs for math problems In the future. If you have another problem, let's say you want to solve one of the biggest problems of all how you can combine quantum mechanics with relativity. We haven't been able to do so, but perhaps if we have a sufficiently intelligent AI system that can start to add constructs and rabbits, perhaps we will find the solution.

Speaker 1:

Because the point is that you're not going to solve the new problem with old thought, so you need to be creative in some sense to combine things. To be creative Is that?

Speaker 2:

I think to take humans are not logical. I would say that 90% of human thought. I heard someone else I think it was my PhD supervisor who said that 90% of the human kind of reasoning is abductive. Abductive meaning is not deductive, saying if something is certain from a logical point of view, abductive means basically, you haven't proven it's wrong.

Speaker 2:

You just make an assumption. I think a car will fly into this room. Prove me wrong. Where I think God exists, prove me wrong. You know, it's abductive kind of reasoning and this is really what they are doing in this model is abductive reasoning, it's not deductive. So they're combining abductive with deductive reasoning and the marriage between the two makes it possible to find mathematical proofs in a way we've never seen before.

Speaker 1:

So in a natural abductive and deductive, and this is one of the ways forward for next generation level mathematical problem solving.

Speaker 2:

I think, problems with the physical world that we have, for the economical world we have, for the medical world we have, for the material world we have.

Speaker 1:

But those complex problems that you need to find new ways to think about how to solve them, because humans have not solved them yet.

Speaker 2:

Yeah, yeah, you was just too stupid to solve some problems. I think it's maybe the structure.

Speaker 3:

I remember we talked about it last time I was here because I just wrote a piece on creativity in that sense and how I tried to formulate it into a differential equation, meaning if I can understand it, so I can write it as some sort of equation or formula, then I understand it on a scalable level. One of those I mentioned, an old thing that really blew my mind, was this old kind of folded. I don't know if you remember this. It's a protein folding game foldit so it was for super nerd. This is before we have cloud computing. Meaning that the biggest problem you have to solve if you have hard computing problems is where to compute on. Meaning we have local clusters on universities and these things. Some of us are nodding and having good memories or painful memories maybe, but one thing that I found also interesting with this case is that you can sign up for this. If you have a university account, you get a protein folding sequence. They're trying to solve the HIV problem.

Speaker 3:

The scientific community have tried for 10 years solving this. Solved 35%. All of a sudden you get a small package. It's a fold. Somebody has a stupid idea from another department, maybe in this case the mathematical department or some other sec. Why don't you train this? This is not how we do protein folding. We all know how it would sound.

Speaker 2:

I need to be more creative. By the way, Alex, you had to leave at half past, right what?

Speaker 1:

was it.

Speaker 2:

No, go on, Otherwise. We need to focus a bit more before you have to leave, on your kind of favorite topics.

Speaker 1:

No, it's all right. The list is long. Awesome, henrik. I'm going to go with the example topic I mentioned to you. I'm completely out of depth here, but I want to talk about it Completely out of depth. Yeah, yeah because I'm moving into his territory.

Speaker 3:

Aha, now I get it. I was like what?

Speaker 1:

I stumbled upon a paper by Google Research. It came out. I think Tuesday, January 23rd it's this week. It's about the ex-former so sparse transformers for graphs. This is a paper that indicates and gives design patterns for architecture for how you build transformers that are more sparse. What does that mean? It means how do you work in the whole transformer in a way that you don't really need to light up the whole neural network?

Speaker 2:

sort of thing.

Speaker 1:

It's expanded graphs and it's about being way more efficient with compute and with energy and everything like that, in order to find the right results. So I think the theme here which I think is a trend is how do we build more things more efficient, and what are we talking about when we say Sparse transformers and stuff like this? So I'm gonna lean a little bit into you on this here now, but, but I think the trend here is a little bit interesting. We started with the mixed trial mixed experts.

Speaker 2:

I want to read about it quickly and I have a very article, so I'm reading right now Don't be an expert.

Speaker 1:

You don't need to be an expert, but I just find the interesting trend here. We are trying to figure out how to not light up the whole neural network is the bottom line.

Speaker 2:

Yeah, and I have a dreaded article so I can't say anything smart about it, but I can at least give some kind of context perhaps. And we know transformers have really transformed pun intended so much of the AI that we have been being able to use sequences like words or even images in a way that we never seen before. So with the additional attention that they have, knowing what to focus on, they have really revolutionized how we can use AI. Now to use it on sequences or images is one thing, but the most general kind of data structure you have is a graph in some way Graph with notes and vertices that you can connect to each other.

Speaker 2:

And how can you use a transformer network done for graphs? That's a hard problem when we spar. Sparse graphs is basically means you have very few connections. In a social network. You have a few friends, but if you Take three steps out of yourself, it's very few of the people that are connected to each other, so it's very sparse and that becomes hard to represent. So they need to have a way to represent this kind of sparse networks and graphs in a good way, and I'm guessing, without having read the paper, that that's something that this.

Speaker 1:

So the bottom line of the paper is that when you have a sparse network and you're trying to do exactly that, it's actually not that many notes happening in reality, but because this transformer doesn't know that he needs to go over the whole fucking Facebook before he can realize that I was his fucking three notes and now, with a former, they're trying to figure out to circumvent that. So, basically, not going through the whole Facebook graph in order to find something that was quite easy to find Because your network was in reality quite sparse, which is, you know, if you think about it, lighting up every single person in Facebook in order to check. Do you know, henrik?

Speaker 1:

you know, rather doing it in another way. It's about efficiency here. I think these kind of you know the way I picked something out of my head competence wise, but I think it's a trend with the mixed expert approaches, with the nano approaches, with sparse network approaches. This is a major trend for 2024 around efficiency and smartness. So we're going to see. You know what is GPT five? What's it going to be? It's going to be more, bigger. It's going to be a mixed expert. You know what are the driving forces? One of the driving trends is efficiencies. We can use it in different ways. You know embedded and stuff like this.

Speaker 2:

This is my sort of. It's a way to use transformers efficiently. For a new structure, which is graphs, there's been other times in a Robert Luciani.

Speaker 1:

Yeah, he had this story.

Speaker 2:

He's been using actually, actually, I was going to.

Speaker 1:

I was going to ask Robert, you know, because he used graphs and transformer to solve traveling salesman's and problem and we he had an idea and he's been proving it, but no one understands what he did two years ago. And I think now we're getting. I haven't still seen any mathematically he was claiming he could you know, he had his, he could prove it.

Speaker 3:

He didn't prove it. He claimed he could prove it. That's not proving it, no.

Speaker 1:

I you know, I was in a business meeting with, together with Robert that with with the guys at AI research at Volkswagen talking about the. You know it's an optimization problem in the transport system. They didn't get it but in reality, like what he did, is you know, thinking about transformers in in a, in a GNN and combining that smartly and I think, I think this is kind of going in this direction, I guess.