What I learned at the AI Safety Europe Retreat

2023-04-17

From the 30th March - 2nd April 2023, I attended the Artificial Intelligence Safety Europe Retreat (AISER) in Berlin, Germany. There were around 70 attendees from all over Europe. Most attendees were actively working on technical AIS (e.g SERI-MATS scholars, independent researchers with grants), some people were focusing on AIS strategy and governance, and some newcomers / students (like me) were looking to learn more about the field and career opportunities. For a short overview of the retreat, have a look here.

This post is targeted at people who couldn't make it to the AISER and/or are considering attending a similar event. Concretely, I will provide you with my takeaways from AISER which I gathered by attending some of the talks and one-on-ones. I will conclude the post with some personal (and thus very subjective) impressions of the event. I will not mention specific names as the conversations and talks weren’t given with the intention of appearing in public. Still, I think many ideas exchanged at the AISER are worth sharing with a wider audience. I will put my personal spin on them as I present them in this blog post, so all critique should be directed towards me.

Some background context about me to put this post into a (critical) perspective may be helpful. For a general overview of my experiences and skills, please consult my about page. I wrote my undergrad thesis on robustness and formal verification of AI, but I wasn't aware of the AI alignment movement at the time. Before joining the AI Safety Europe Retreat, I had only just completed the AGI Safety Fundamentals curriculum with a reading group at my university - otherwise I had no experience in the field. But I'm motivated to work on AI Safety (AIS) because people seem to trust AI too much, and I want to make AI actually worthy of such trust. Also, another appeal of AIS is its interdisciplinary approach. It employs elements from diverse fields (which I find very cool and interesting to work in in their own right) such as philosophy, mathematics, computer science, and everything in between.

My most important takeaways

Read the bold text if you only care about the main points, the text following those expands upon the idea.

On contributing to technical AIS as a newcomer

Tell your friends about AIS. This is one of the easiest ways to get involved. Note that you shouldn't try to convince your peers to also work on AI Safety. Doing so may create negative feelings, especially in your AI capabilites friends. Simply describe your work and why you are enthusiastic about it - people are usually more open if you're just hyped about something. This may inspire them to read/hear more, which would encourage them to form their own well-rounded opinion on AI safety.
Start an AGI SF reading group in your area/university. The curriculum of the course is changing year-by-year, so attending it even if you have done so already in the past may be quite beneficial. I personally find that revisiting material after a while solidifies my understanding of it. Moreover, the different perspectives from attendees on some approach/idea in the field may be a reward in of itself. To contribute even more, at the end of the course each participant could write a distillation blog post or hold a presentation about a topic they fancied.
Help out with organizing events and field-building. Getting into technical AI safety (as is mentioned in the next section) is not very easy, especially if you're in high-school or early undergrad. Until you develop the necessary chops to work on it yourself, you may organize events and field-building for the people who are more experienced and are able to contribute already. This way you are creating research by proxy.
Work on distillation. Distilled complex ideas enable newcomers to grasp the main concepts faster, speeds up research (since understanding is faster), and solidifies the writers' knowledge about the concept. The classic examples are Chris Olah's blog and distill.pub. Nowadays, there's a lot of this work on LessWrong and the Alignment Forum, in fact, some organisations are being created precisely for distillation. I also came across some tips at AISER for technical writing.
Contribute to open-source AIS projects. Example projects include the Rob Miles-led aisafety.info info page (this gives a good idea why this may be worth your time), or something more technical such as the repository of Elliciting Latent Knowledge.
Join an alignment hackathon. If you have experience in software engineering and/or machine learning, a hackathon could be a great way to challenge yourself and see if you'd be a fit for the specific alignment topic the hackathon tackles. Some big names sometimes organize these, and if you do well during such a hackathon, it could proppel your career.
Check out aisafety.training. It is a database that lists all programmes available in relation to AI safety. Usually, these programmes are quite competitive and you will need to have good baseline skills to be considered. However, applications to ERA and SERI-MATS are known to be very beneficial in of themselves as they prompt you to think hard about some problem in AI alignment.

On employment in AI Safety

The top companies working on applied AIS (Deepmind, Anthropic, Conjecture, OpenAI, Redwood Research) have an insanely high bar. There is a shortage of AI safety organisations / companies at which junior-medior level people could work at, so the current AIS job landscape is very competitive. Thus, especially the top companies can be very picky and maintain a high level of excellence in their employees. If you apply to them (and you definitely should if you think you're good), as is usual for such competitive positions, prepare for a lot of rejection. In order to avoid stress or panic if you get rejected, try to have a good alternative plan ready which you would default to if you were not hired.
Focus on getting good and demonstrating this experience somehow (blog posts, papers, open-source repos). Even if you have connections, you will not be hired if you're not good at what you are doing. If you consistently get rejected without an interview, it may mean you didn't meet the basic prerequisites, and it points to the fact that you need to upskill.
Most people that are already quite skilled in AI are 3-6 months away from meeting the high bar of the top organisations. This may mean that you have a lot of general experience in the area but not specific knowledge that the organisation can directly benefit from. The top organisations most likely need you to be good now and contribute immediately on their projects, not in a few months during which you will “pick things up”. This may differ a lot from the mindset of regular jobs. It is on you to learn about their specific research agendas and demonstrate your experience in their narrow field.
Try to get a mentor and/or gather honest opinions from knowledgeable people who can give you constructive advice. Try to avoid your personal echo-chamber. This is especially true when upskilling and still lacking experience in the field. You can get a mentor e.g. at EAG and EAGx events, reaching out to SERI-MATS scholars, or through LessWrong, 80k hours, Effective Thesis, ENAIS, etc. Remember that you don't need a single mentor for everything - it might be a lot of work for them! You could have several mentors each contributing one aspect (e.g. one mentor for feedback, one for ideas, one for general bugfix, etc.)
You don't have to get a PhD to work on AI alignment. It seems to me that this field, more than any other that I've interacted with (except, perhaps, software engineering), welcomes new ideas and approaches from independent individuals. So, if you manage to get good on your own, big name organisations may hire you without academic credentials. The most interesting alternative to me is the ability to do independent research. EA organisations fund innovative projects, and after all, you don't need a lab for your experiments, at most some compute credit. AIS community encourages posting your findings on LW, the Alignment Forum, or personal blogs.
Also, you don't have to go the most common route (independent research or get hired by AIS organisations) to work on AI alignment. You can still work on AIS-related topics (e.g. robustness, fair, transparent models) in industry companies and you can still write up your findings in blog posts or even publish papers. However, this route may require you to spend a significant amount of time on AI capabilities, making it not the most effective one for AIS.
Before doing independent research long-term, think if it's really for you. Long-term independent research most likely means that you want to contribute to the field with your own ideas, not just engage in it for upskilling. So, to succeed, you will have to be very knowledgeable, confident, and stubborn. You can only do this if you have a structured work ethic and manage to provide deliverables on your own. Most importantly, you will have to be extremely proactive - reach out to people, find your own opportunities, set reasonable goals, stay on top of the research getting published. Also, you will be alone most of the time, so that may be demotivating and even a bit scary.

On technical writing

Before starting, have a small plan for what you want to achieve. Describe to yourself a theory of change that your writing should achieve, e.g "by the end of this post/talk, readers/attendees should be able to..."
Gauge your audience's knowledge and introduce concepts they may be lacking. In a conversation, ask if they are familiar with X, in a blog post, perhaps it's better to assume minimal knowledge and briefly introduce concepts, but offer to skip over them if readers already know them.
Align mental models. Get feedback from someone, ask about their mental models, i.e. what they imagine, as they read your introductory paragraphs. Does the model correspond with yours? If it doesn't, is it good or bad? Is it simply a different perspective, or is the person imagining something wrongly due to your text?
Concrete before abstract. Before introducing some mathematical definition, provide some examples of it instead. Perhaps motivate the problem through a story (why do we care about this?) and concrete examples, then abstract away and introduce the formal definition. This way it might be easier for the reader to spot patterns and learn the concept, especially for the non-math majors.
Explain through analogies. This could tie in to the point above. Explain how the concept and analogy are alike - explain how they are not; there is some danger in it. Have test runs for the analogies. The analogy should be very clear to every person, the more familiar they are with it, the more easier they may see a connection with something else. (This is partly the reason why I don't like MIRI's market analogies/definitions - I am not very familiar with them and they only make the topic harder to understand.)
In a distillation work, have mini-tests of knowledge. This way, the person has to remember, understand, and apply (closely following Bloom's taxonomy. E.g. "Stop and apply this concept for 3 things you know".

On EU AI Safety Act

"The AI Act is a proposed European law on artificial intelligence (AI) – the first law on AI by a major regulator anywhere. The law assigns applications of AI to three risk categories." More info and the working version of the act can be accessed at artificialintelligenceact.eu/.
The example consequences of this act involve banning chatbots which pretend to be human and banning live analysis of visual data in public spaces. These are categorized as unnacceptible risk AI applications.
High risk AI technology will have to adhere to certain safety standards. It will spur awareness of AI safety as well as the need for overall guarantees for technology in Europe. The Act will force companies to think about AI safety, they will not be able to launch just anything. This makes AI products more akin to other potentially dangerous products in the market such as cars.
The liability directive would allow individuals to sue companies easily if the AI did something wrong.
The hope is that this will be enforced more adamantly than GDPR. Each country will have a responsibility to take this seriously. If, after a few consecutive fines, the company does not comply, their product may be banned.
AI products will be assigned three risk categories. If the risk is high, you may be audited. The exact standards to which the products will have to adhere is still in the making, and the exact way companies will be audited is still unclear. An interesting next step is figuring out how to exactly perform audits, what kind of authority will be able to do it, how to make sure the results are actually good, and if any threats are found, how to offer ways to solve them. The soil is ripe for orgs performing auditing whose methodology is based on some AI safety approach.
The Act comes into play only when AI-powered products are deployed. Thus, it does not prevent someone engineering an AGI in their basement.

Overall Impression

The event was well-planned, everything was taken care of - food, accommodation, space for hang-outs and talks. Attendees could focus on the people and ideas instead of any big event-related problems. The event took place in the outskirts of Berlin, so the atmosphere for one-on-one walks was great. There was (in hindsight, unsurprisingly) a strong EA-vibe at the event - both the structure of the retreat as well as people’s opinions and personal philosophies. I learned a lot from this getaway, met a lot of new people whose motivation was passed onto me, and the European AIS scene started feeling more like a community. For me personally, there was only one drawback (that wasn't the fault of the event itself) which made me feel exhausted afterwards.

As a newcomer (indeed, I seemed to be one of, if not the, person who has had the least exposure to the rationalist AI safety community at the event), I was overwhelmed with the amount of new information. I had felt similarly before when working on some project I have had no previous exposure in, but the technical AI safety community is something else. It seems everyone is coming up with new research agendas and is encouraged to attack the problem in their own way. I think this is interesting and a worthy approach to solving alignment, however, this creates waves of new terminology and raises questions about how much potential these concepts actually have.

I think the feeling of overwhelm primarily came because I wasn't aware of the nuances of the different methods, and didn't have a good overview of the field, had no clue how the mentioned approaches fit in the bigger picture. Whenever I would hear someone talk passionately about some technical approach, my first inner reaction usually was “this is a well-established approach, I should really look into it” (mostly because I am used to this coming from academia), which breeds a sense of urgency and the sense of not being aware of well-known research. Whenever I do look into the mentioned approaches, however, it often turns out that someone came up with this method at most 3 years ago in a blog post or research agenda that hasn’t yet had the opportunity of being thoroughly investigated. Of course, by no means does this undermine the value of the approach talked about. The reaction I described may not be all that different from research-heavy academic environments. Even though I am aware there exist AIS labs at universities and well-established companies (as can be seen here), I seem to still have a bias that AIS research primarily comes from independent thinkers whose ideas are more questionable. However, perhaps this is partly due to a lot of the research circulating via blogs instead of peer-reviewed papers.

Clearly, the integrity of the ideas does not depend on the association of the researcher or the medium through which they are shared, but I must invest some time to overcome this personal mental bias. It is simply a very different exposure to terminology from what I’m used to, so I will have to change my mindset whilst encountering such terms in the field of AI safety. That is, I must take on a more sceptical as well as honest view, and be open to red-teaming the idea, instead of taking it at face value. Doing so may improve my overall research skills as well.

Something else that I wasn't aware about previously is that this community really doesn't like anyone working on AI capabilities (that is, your regular AI research and industry applications e.g. creating new architectures, training models for novel tasks, etc.). You can read about the main reasons here (some of the points in this specific post are quite controversial and not everyone in the AIS community seem to fully agree on them as can be seen in the comments, but it's one of the best one-stop link in my opinion). I suppose the dislike is not nearly as strong for your run-of-the-mill AI startup than it is for the huge companies with more resources. However, non-AIS work is still seen as deterring from the most important problem at hand, and thus some distaste is still present.

In conclusion, I would recommend you to attend such an event if you are interested in figuring out whether AI Safety is for you. Such an event also benefits anyone looking for connections and collaborators, or simply getting up to speed with what is going on in AIS. Be prepared for some intense philosophical and technical conversations - they may be both very rewarding as well as draining.