AIAW Podcast

AI News v4 2024 - AlphaGeometry, Solar 10.7B, Exphormers

Hyperight

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 17:35

Unveil the latest breakthroughs in AI with AIAW News' fresh release, AI News v49 2023! This edition brings you DeepMind's AlphaGeometry, an AI marvel solving Olympiad-level geometry problems with unprecedented proficiency. Get insights into UpStage's Solar 10.7B, a pioneering system transforming large language models for single-turn conversations. Witness Google's Exphormer, a machine learning innovation, tackling the scalability challenges of graph transformers and revolutionizing the analysis of graph-structured data. These trailblazing developments are reshaping the landscape of AI, from mathematical reasoning to advanced neural network applications. Tune in to the AIAW Podcast for a deep dive into these fascinating advancements at www.aiawpodcast.com or stay updated via Twitter @aiawpodcast.

Follow us on youtube: https://www.youtube.com/@aiawpodcast

AI News

Speaker 1

It's time for AI News brought to you by AI AW podcast .

Speaker 2

So we have added this kind of section in the middle of the podcast where we just had a short break speaking about some personal favorites , news items that happened in last couple of weeks , and each of one of us can choose to bring up a topic . If you have any , if you don't feel free to ignore , but if you have , try to summarize it in three , four minutes , if you can . Anyone else wants to go first ?

Speaker 3

Holy crap , I can go .

Speaker 2

Yes , let's go Super excited .

Speaker 3

No , I just came back from South Korea and I met with a lot of really interesting people , Primarily two things One in the ability to train models on visual input video and such , which is super cool , but one thing that really blowed my mind is the small language models .

Speaker 2

Yeah also .

Speaker 3

I never tried it and I couldn't believe it . This is like the equivalent of me connecting to internet . I was shocked . I downloaded it . I can't get it to run here in Europe , but I run it in Korea for like a couple of weeks . I downloaded it . Just help me with my writing like real time on anything within my operating system . It's from the upstage team .

Speaker 1

So the same one that built .

Speaker 3

Solar Model .

Speaker 1

So the idea with the small transformer is actually I can put it inside my laptop and I can write in real time .

Speaker 3

It runs all the time and whenever you write something , it's there .

Speaker 2

In real time , more or less yeah .

Speaker 1

This is for me how it should work . No but flipping back and forth .

Speaker 3

Yeah , but I was . I didn't want to believe it . I had to even go turn off my Wi-Fi , even check that it didn't sneak out on my Wi-Fi in some weird way . It's like I didn't want to believe it because it was so magical , it was insane .

Speaker 2

Was it like auto completing or was it really like writing so ?

Speaker 3

I started to . I used it in different ways , but the classical kind of you start writing something and then that little icon come up and then boom , and then you have like formal , informal . You can like auto prompt it in some way shorter , longer , use emojis , use not emojis meaning that it takes whatever you write and then give you a suggestion how you can actually write it better .

Speaker 3

Out of the existing text Of the text that I started to write , so it has a little bit of an input and on that input it's like auto completed ish and styled it up into formal , informal and these kinds of things . So think of to like Grammarly , but real time in everything , everywhere . And now , like I don't know . I wrote in my predictions for 2024 about small models and how excited I was , I did as well , actually .

Speaker 3

And then I was trying it and I couldn't even predict how excited I was . Sorry , I just get all like a kid here . The organ is going to come back , the vacuum organ .

Speaker 2

I think that's one of the big trends for 2024 , that we will have more efficient models . That is going to democratize , if you can abuse that word a bit to basically have any influence .

Speaker 1

And this is all that Gemini did the storyline with the huge , the normal and the small , the nano one , the nano one they call it . It makes sense .

Speaker 3

I run it on my old M1 MacBook , so it's not of the super new ones . I mean , this is several years , I don't even know how old it is , but it works like a charm and a meaning that's going to be in every single thing more or less so .

Speaker 1

all of a sudden now you get to embed a language model inside a product or something .

Speaker 3

Equally as big as internet , I would argue , and I think now it's much more . Yeah , I'm trying to be nice here , but internet there's people here in internet watching us .

Speaker 1

Okay , so that's good news .

Speaker 2

Alex , do you have anything you want to bring up ?

Speaker 3

No , okay , I was thinking we're going to go back to that topic , but bring your parents .

Speaker 1

Actually , that was the news . That was the news topic that we can now bring into this , because , if we're cutting it , there was an Instagram . There was a storyline around recruitment in US . Tell us the news , because this is news , because it's .

Speaker 3

Again , it's enough . Now he has forgotten . Go ahead and we'll come back in .

Speaker 2

Do you want to go next ? Sure , a lot of stuff happening , and it was actually a number of weeks since we had the last podcast . Now with the Christmas holidays and whatnot , so it's hard to choose what to really pick up on , but the one I chose is actually something that I have mixed feelings about . It's called Alpha Geometry . It's from DeepMind .

Speaker 2

And it combines these kind of traditional rule engines or symbolic engines with more language models . So in this case it's in a competition , a math competition called the International Math Olympiad , and it basically become as good as the gold medalists or between the silver and gold medalists of the best humans ever competing in this kind of math Olympiad , and it's in geometry kind of math but of course , super impressive . What's new really in this one is it's not really the combination of having both rules or symbolic systems combined with neuro , neuro and network kind of solutions . That's been done a lot in this kind of hybrid system . But what's new is they use language models to do the creative part .

Speaker 2

So when they describe a bit what humans do is okay , they can use productive rules to try to see how can I prove that this kind of geometry has X and Y property and they know the rules . The problem is that the thing that humans really good at is coming up with rabbits , as they call it . So the rabbit is this kind of weird thing . What if we add this kind of variable to ?

Speaker 2

the equation and it comes from nothing . It doesn't really make sense . You just have some kind of creative inspiration of what happens if we do this , and that is really a hard thing to do , especially for a computer , because it doesn't have any sense on . Why should you add this specific thing ? So some kind of intuition is guiding humans in finding solutions for these kind of math problems . Now , what they were able to do better than anyone before was use large language models to do the creative part of pulling the rabbit out of the hut . I would say that they are basically taking the problem that a lot of people see with large language models , which is hallucination , that they just make up things that looks good , but actually using that as a feature . So suddenly the ability for large language models to make up stuff becomes useful because it's still related . It's based on some kind of intuition that you can't really describe . It's just that .

Speaker 2

okay , let's add this in the language of math in this case and suddenly they first try to do this just with rules . They may fail finding a solution . Then they ask the language model please add a construct or a new rabbit . Then they try again and it may not work . And you try second and the third time . Then , by adding a number of constructs , suddenly it finds a solution , it solves the problem , it finds the proof . And this kind of combination of using neural networks , large language models , to be the creative , hallucinating part of humans , combined with the very deductive logical rule engine , is very fascinating . So this kind of creative part of being able to come up with strange things that you don't really have a reason for , intuition in some way and you can call it hallucination if you want it's turning out to be really useful .

Speaker 1

And what is the usefulness ? Is it creating new ideas on this ? Why is this useful ?

Speaker 2

They just use it to show that the AI model is as good as humans , more or less , but of course in the future In creative thought , in finding proofs for math problems In the future . If you have another problem , let's say you want to solve one of the biggest problems of all how you can combine quantum mechanics with relativity . We haven't been able to do so , but perhaps if we have a sufficiently intelligent AI system that can start to add constructs and rabbits , perhaps we will find the solution .

Speaker 1

Because the point is that you're not going to solve the new problem with old thought , so you need to be creative in some sense to combine things . To be creative Is that ?

Speaker 2

I think to take humans are not logical . I would say that 90% of human thought . I heard someone else I think it was my PhD supervisor who said that 90% of the human kind of reasoning is abductive . Abductive meaning is not deductive , saying if something is certain from a logical point of view , abductive means basically , you haven't proven it's wrong .

Speaker 2

You just make an assumption . I think a car will fly into this room . Prove me wrong . Where I think God exists , prove me wrong . You know , it's abductive kind of reasoning and this is really what they are doing in this model is abductive reasoning , it's not deductive . So they're combining abductive with deductive reasoning and the marriage between the two makes it possible to find mathematical proofs in a way we've never seen before .

Speaker 1

So in a natural abductive and deductive , and this is one of the ways forward for next generation level mathematical problem solving .

Speaker 2

I think , problems with the physical world that we have , for the economical world we have , for the medical world we have , for the material world we have .

Speaker 1

But those complex problems that you need to find new ways to think about how to solve them , because humans have not solved them yet .

Speaker 2

Yeah , yeah , you was just too stupid to solve some problems . I think it's maybe the structure .

Speaker 3

I remember we talked about it last time I was here because I just wrote a piece on creativity in that sense and how I tried to formulate it into a differential equation , meaning if I can understand it , so I can write it as some sort of equation or formula , then I understand it on a scalable level . One of those I mentioned , an old thing that really blew my mind , was this old kind of folded . I don't know if you remember this . It's a protein folding game foldit so it was for super nerd . This is before we have cloud computing . Meaning that the biggest problem you have to solve if you have hard computing problems is where to compute on . Meaning we have local clusters on universities and these things . Some of us are nodding and having good memories or painful memories maybe , but one thing that I found also interesting with this case is that you can sign up for this . If you have a university account , you get a protein folding sequence . They're trying to solve the HIV problem .

Speaker 3

The scientific community have tried for 10 years solving this . Solved 35% . All of a sudden you get a small package . It's a fold . Somebody has a stupid idea from another department , maybe in this case the mathematical department or some other sec . Why don't you train this ? This is not how we do protein folding . We all know how it would sound .

Speaker 2

I need to be more creative . By the way , Alex , you had to leave at half past , right what ?

Speaker 1

was it .

Speaker 2

No , go on , Otherwise . We need to focus a bit more before you have to leave , on your kind of favorite topics .

Speaker 1

No , it's all right . The list is long . Awesome , henrik . I'm going to go with the example topic I mentioned to you . I'm completely out of depth here , but I want to talk about it Completely out of depth . Yeah , yeah because I'm moving into his territory .

Speaker 3

Aha , now I get it . I was like what ?

Speaker 1

I stumbled upon a paper by Google Research . It came out . I think Tuesday , January 23rd it's this week . It's about the ex-former so sparse transformers for graphs . This is a paper that indicates and gives design patterns for architecture for how you build transformers that are more sparse . What does that mean ? It means how do you work in the whole transformer in a way that you don't really need to light up the whole neural network ?

Speaker 2

sort of thing .

Speaker 1

It's expanded graphs and it's about being way more efficient with compute and with energy and everything like that , in order to find the right results . So I think the theme here which I think is a trend is how do we build more things more efficient , and what are we talking about when we say Sparse transformers and stuff like this ? So I'm gonna lean a little bit into you on this here now , but , but I think the trend here is a little bit interesting . We started with the mixed trial mixed experts .

Speaker 2

I want to read about it quickly and I have a very article , so I'm reading right now Don't be an expert .

Speaker 1

You don't need to be an expert , but I just find the interesting trend here . We are trying to figure out how to not light up the whole neural network is the bottom line .

Speaker 2

Yeah , and I have a dreaded article so I can't say anything smart about it , but I can at least give some kind of context perhaps

Efficient Transformers in Graph Structures

Speaker 2

. And we know transformers have really transformed pun intended so much of the AI that we have been being able to use sequences like words or even images in a way that we never seen before . So with the additional attention that they have , knowing what to focus on , they have really revolutionized how we can use AI . Now to use it on sequences or images is one thing , but the most general kind of data structure you have is a graph in some way Graph with notes and vertices that you can connect to each other .

Speaker 2

And how can you use a transformer network done for graphs ? That's a hard problem when we spar . Sparse graphs is basically means you have very few connections . In a social network . You have a few friends , but if you Take three steps out of yourself , it's very few of the people that are connected to each other , so it's very sparse and that becomes hard to represent . So they need to have a way to represent this kind of sparse networks and graphs in a good way , and I'm guessing , without having read the paper , that that's something that this .

Speaker 1

So the bottom line of the paper is that when you have a sparse network and you're trying to do exactly that , it's actually not that many notes happening in reality , but because this transformer doesn't know that he needs to go over the whole fucking Facebook before he can realize that I was his fucking three notes and now , with a former , they're trying to figure out to circumvent that . So , basically , not going through the whole Facebook graph in order to find something that was quite easy to find Because your network was in reality quite sparse , which is , you know , if you think about it , lighting up every single person in Facebook in order to check . Do you know , henrik ?

Speaker 1

you know , rather doing it in another way . It's about efficiency here . I think these kind of you know the way I picked something out of my head competence wise , but I think it's a trend with the mixed expert approaches , with the nano approaches , with sparse network approaches . This is a major trend for 2024 around efficiency and smartness . So we're going to see . You know what is GPT five ? What's it going to be ? It's going to be more , bigger . It's going to be a mixed expert . You know what are the driving forces ? One of the driving trends is efficiencies . We can use it in different ways . You know embedded and stuff like this .

Speaker 2

This is my sort of . It's a way to use transformers efficiently . For a new structure , which is graphs , there's been other times in a Robert Luciani .

Speaker 1

Yeah , he had this story .

Speaker 2

He's been using actually , actually , I was going to .

Speaker 1

I was going to ask Robert , you know , because he used graphs and transformer to solve traveling salesman's and problem and we he had an idea and he's been proving it , but no one understands what he did two years ago . And I think now we're getting . I haven't still seen any mathematically he was claiming he could you know , he had his , he could prove it .

Speaker 3

He didn't prove it . He claimed he could prove it . That's not proving it , no .

Speaker 1

I you know , I was in a business meeting with , together with Robert that with with the guys at AI research at Volkswagen talking about the . You know it's an optimization problem in the transport system . They didn't get it but in reality , like what he did , is you know , thinking about transformers in in a , in a GNN and combining that smartly and I think , I think this is kind of going in this direction , I guess .