AI Safety & Alignment at Cohere Labs

Since January 2024, I have been serving as the Community Lead for AI Safety & Alignment at Cohere Labs (formerly Cohere for AI), an open-science initiative.

Speaker Series

Over the past year, I have organized 12+ speaker sessions with researchers working at the frontier of AI safety:

Bartosz Cywinski (Anthropic / Oxford) — Eliciting Secret Knowledge from Language Models: Using logit lens, sparse autoencoders, and prompt attacks to discover hidden knowledge fine-tuned into LLMs.
Kola Ayonrinde (UK AI Security Institute) — What Makes a Good Model Explanation? A beginner-friendly introduction to mechanistic interpretability, feature finders, and compression-based criteria for evaluating circuit-level explanations.
Nathan Calvin (Encode AI) — AI Governance in Practice: How a three-person nonprofit helped shape California’s SB 53, the Frontier AI Transparency Act. Nathan’s background spans the Senate Judiciary Committee, the Center for AI Safety Action Fund, and Stanford Law.
Rehana Al-Soltane (Raspberry Pi Foundation) — Education in the Age of AI: How AI-enabled learning doesn’t automatically equate to deep learning, and what must change in both teaching practices and AI technologies.

Philosophy

If we want AI to benefit humanity, we have to fight for it. The people working on AI governance and accountability are just as important as the people building the models.