2025-03-01

On AI Safety, Incentives as Filters, and Coporate Culture

Six Thoughts on AI Safety Covers a range of interesting ideas: AI alignment is not about “human values” first then “following rules” second. Instead, compliance to specs/rules should come first (similar to the way humans behave!). AIs should resort to human values when out-of-distribution. A second idea is about getting safety from a balance in power (Player A & B both have ASIs => protect/cancel/dampen each other). My opinion is that Deterance theory collapses the more fatal, accessible, and hidden a weapon is.
Incentives as selection effects A fun thought experiment that explains the role incentives play in life. Instead of thinking of them as forcing functions to change behavior, consider them filtering mechanisms (we can’t steer behavior, we only select for people apt to it). This seems to be partly true (i.e., a contributor) as there is a spectrum on how much a person wants something, we can bring more out of a person compared to another depending on how much we incentivize.
Culture Matters While culture is mostly in the background, it can work wonders in aligning & growing team efforts. Culture influences what people believe important (mission) and possible (growth). We often absorb our sense of what’s achievable from our cultural environment (other peers and mentors), which becomes a major driving factor in our performance.

Subscribe