AI Alignment from First Principles

On two recent preprints I released on the foundational Philosophy of AI Alignment

Jun 22, 2025

I entered a group working on the AI Alignment problem that secured funding from a Research-for-AIA outfit of Emmett Shear (founder of Twitch, briefly replaced Sam Altman during an OpenAI drama moment).

Upon beginning the discussions for the proposal for the funding, I, as standard fare, began to devour literature about the topic as much as I could. However, as I went through it, the process became comprehensively disappointing. There were no foundations, there were just ideas and disconnected simulations. There was no general functional corresponding to Alignment, and there was no formalism defining what constitutes it exactly. Moreover, “Human-Aligned AI” presents the question: which human? There are plenty of humans out there. Consider the values of 1) a typical college educated white-collar professional in San Francisco 2) an evangelical Christian farmer in Oklahoma, and 3) a Berber market trader in Morocco. They will all be very different, on many matters!

Asking others about it didn’t help either. It seemed that ultimately AI Alignment was whatever people who wrote about it said it was, while affirmed by others who are known to be experts on this field.

As such, it was entirely memetic. Disembodied signifiers, floating far above reality-checking and with proneness to psychological bandwagon effects, incentives towards ineffective or even harmful ideas of procedures pushed by groupthink dynamics, and an overall challenging environment to perform research inquiry with epistemic virtue.

I studied this community like an Anthropologist, observing their patterns of speech and behaviors as communicated in their digital content. I found this to be a hypothetical example of a typical AIA ritualistic tribal memecycle:

1) Yoshua Bengio gives a big speech declaring ``Soosee is that future of AI and can solve all of our problems or could destroy us all. Soosee is deep...it is powerful...what Soosee is..is the radically empowered authenticity exponentiated through digital wisdom wizardry.''

2) Early adopter CS researchers begin publishing papers on Soosee

3) Anthropic announces that it's simulations predict Soosee will annihilate humanity in two years and it needs $20 more billion in VC funding to save everyone

4) NeurIPs Workshop on Soosee

5) Eliezer Yudkowsky writes an 800 page book on Soosee that concludes that everyone should one day become a transhuman-cyborg

6) (By the way no one actually knows what Soosee is, but everyone is pretending they do)

7) OpenAI has a rogue researcher flee the company with warnings of Soosee-aided world domination, claims to know the secrets of understanding Soosee and raises $ 3 billion in VC Start up Capital without even a Minimal Viable Product

8) Rank and file CS researchers begin publishing papers on Soosee

9) Fareed Zakharia publishes ``Is Soosee Compatible with Democracy?'' It's a best seller and talked about by the media for a few months

10) (By the way still no one actually knows what Soosee is, but pretending they do to signal in-group membership)

11) Lower tier conferences have workshops on Soosee

12) Joe Rogan has a podcast discussing ``Soosee and Aliens: a Hidden Connection?'' with, for some reason, Marilyn Manson

At this point the memecycle repeats itself with a new idea and theme.

I decided if I was to make sense of this topic, I would have to formulate its foundations in a comprehensively methodical way.

And so as part of my component of the proposal, and during the research, I sought to fully formalize What is AI Alignment? How can one define appropriate optimization or optimization-like procedures to maximize Alignment or constrain some reasonable minimal enforcement?

Physical Cybernetic Aspects of AIA

The first paper provides a history of AI, defines an ontology and formalism for how to consider AIA, and the describes the primary themes and problems, techniques, and frameworks to develop new techniques for the cybernetic and engineering based aspects of AIA

AI Alignment Foundation from First Principles: Cybernetic, Mechatronic and Engineering Aspects

Here the ontological formalism is considered on the foundations of the philosophy of Badiou, who declares that ontology is mathematical formalism. Through this lens, the substrates of AI: that is the agent itself, the world it interacts with, the digital software and means of production used to develop it, and the means of processing sensor and actuator signals, are constituted to correspond to their appropriate mathematical domain, including physics, logic, statistics, etc.

This last point is critical: aligning AI here considers aligning it to reality and to the teleology of the operation, for which the use of scientific domain knowledge as well as computational mathematics provides the pivotal toolkit. If considering a world model with a classical physical system, springs or billiard balls, a neural network model that inherently constraints the system to exhibit the properties of geometric mechanics becomes relevant. In considering certain hard guardrails of behavior in an environment, deontic logic provides an interpretable, but still firm and crisp integration of background knowledge into the otherwise black box AI model.

Indeed this black box nature is one of the most salient problems with AI by deep learning, and so to align AI is to simply endow the structure of its development with hard domain knowledge.

Common Question: Why Badiou? Why not any other philosopher? Why is he the ultimate truth here?

First, to have an intention to find an ultimate truth is the wrong one here. It is simply an inappropriate standard and epistemological consideration. In the context wherein one is either given, or gives oneself, a research problem to solve, one is required to choose some model and methodological framework. The construction of a model is based on the most important instrumental concerns for the task at hand.

In this case it is to formalize AIA, and in particular do so in a manner that is independent of culturally contextual language games. In addition, as an applied mathematician I have an understanding of how one applies equation formalisms to systems in the world.

Not to say that Badiou doesn’t make a compelling case, but that is a point whose agreement would depend on a potential reader of his work. Being and Event is the most comprehensive and relevant one.

Human-Aligned AI and AI Ethics

This next one considers Human-Aligned AI and AI Ethics. In this case, the psychological, sociological, economic, and political considerations are summarized in the literature, and ontology and guide to appropriate scientific disciplines is discussed.

AI Alignment Foundations from First Principles: AI Ethics, Human and Social Considerations

In this case, because the social sciences are concerned, significant care must be taken into consideration. Please see the draft of the Epistemology of Causality. A significant source of the poor reputation of the soft sciences, justified through the replication crisis, is that the model representation of empirical practice does not reflect the mechanisms of the real world in the system. For instance, doing statistical tests in experiments in physics involves including the equations of the dynamic system simulated, while in social psychology a generic statistical model is used. This statistical model has nothing to do with a person living and engaging with the world.

There are three main considerations for the social sciences in light of their crises: 1) be less confident in the certainty of claims 2) only when there is harmony across multiple layers of abstraction, with both empirical statistics as well as theory of mechanism, can one claim some knowledge of certainty of a claim 3) psychology doesn’t have to be true to be useful – the fact that psychotherapy helps mental health outcomes but the modality doesn’t matter points to the suggestion that psychological literature is still useful as Hermeneutics – that is narratives that assist people in considering their life and then making good choices with confidence.

In general, in listening and reading the literature on the topic provided by colleagues, not only were there aforementioned limitations of a complete lack of foundational theory and verity adjudication process, I was disturbed at the complete lack of analysis from either the disciplines of Economics or Evolutionary Psychology and Human Primate Anthropology

One of the advantages of the use of evolution towards insight in human nature is, in contrast to purely empirical significance tests of associations, is that a model of mechanism is presented – how the behavior presented itself in historical time, which is ultimately driven by natural selection. Thus if a behavior can be presented itself as advantageous in competition for survival and replication, simulations verify this, and the general principles is observed empirically across a wide range of both modern and modern hunter gatherer tribes and borne out by paleological evidence, then we have harmony across a large set of levels of abstraction. This presents significantly more evidence towards something than some (often unreplicable) empirical study alone.

In considering how this applies to AI, we can consider ethical language games in LLMs.

We can see that, when we apply the contemporary scientific understanding of language, ethical language is fundamentally used to simultaneously promote in-group harmony together with identifying in and out-groups and behaving and organizing group behavior hostile to outgroups. For this portion of the manuscript I give a shout out to David Pinsoff at Everything Is Bullshit, who answered some questions regarding the research literature on the topic and helped me confidently certify my understanding on the state of the art (I will add him in the acknowledgment in the future version)

This circumstance regarding ethical language games presents a moral hazard: the training data of LLMs is going to promote, at the expense of everyone else, the values of the most common source of the data: the media and content of the western Professional Managerial Class. The implications of this are that AIA Researchers, who are members of the PMC are in a solid institutionalized position to develop AI in a manner that benefits their class interests, to the detriment of everyone else. That is, the proliferation of luxury beliefs and virtue signalling, on account of them not experience any possible consequences for whatever they push for, support, and implement

The paper moves on to Economic issues. It summarizes some of the findings of the National Bureaeu of Economic Research’s workshops on AI and the Economy. One of the salient points in the literature is the measurement of little consequence in productivity figures on adoption of AI. The paper then discusses the possibility of a bubble and considers Welfare Economics perspective on maximizing human welfare by using AI to increase productivity in the industries the poor work in in Africa, farmers and pastoralists, as well as the working class in the west.

Common Questions:

I don’t like Evolutionary Psychology / I don’t like this interpretation of language as described by Evolutionary Psychology

OK..but what’s the alternative? The only competing framework is the thoroughly discredited, at 80% estimated failure to replicate and with arbitrarily concluded textbook facts, field of classical mainstream social psychology. Once it is decided that the problem is to be solved, something must be used, and the most methodologically sound method must be the choice. Under the framework of the greatest number of established levels of abstraction analysis and scientific method agreeing towards the same conclusion. Certainly there are conclusions from within that community that I find epistemologically questionable myself, but 1) it’s the best, or even only sensible, game in town and 2) the function and role of ethical language is a general concept and theory that is consistent with a wide range of observed phenomena. Thus it is more robust than a one-off statistical observation.

Note that David Pinsoff, who helped me with understanding the evolutionary psychology of ethical language, is also a successful entrepreneur, having co-founded Cards Against Humanity. I say this because success in the business world requires a solid understanding both consumer and business industry psychology at least at a subconscious level. And this, clearly, provides yet another level of abstraction that confirms a high probability of a having an accurate understanding of human nature.

Recall the only existing alternative: when thinking of the SF educated individual, the farmer in Oklahoma, and the Berber in Morocco, the LLM training data will strongly push for the values of the first, which has been found empirically. And the values of the western Professional Managerial Class are at least just as prejudiced as anyone else is.

Finally, even going into all this is unnecessarily generous on my part. I am applying what I know about the scientific state of the art regarding human nature to the topic of AI. It’s not my job to defend the main conclusions on human nature derived from scientific state of the art. If you don’t like evolutionary psychology you should write people working in the field and address them directly. But first, you should probably read it extensively and make sure you understand it. As a start, you can look at this article and the references therein from Dan Williams, who has done the work of curating the research on the endemic role of performative virtue signaling, subterfuge and status-seeking as foundations of superficially altruistic behavior and speech. In other words: there are people who have worked on the details of this matter, you should first read their work, and only then if you still disagree, you can engage with them as to what your concerns are. I am simply applying this well established base of knowledge to the context of AI. My arguments therein seem rather straightforward, but you are certainly welcome to consider if my application of the ingroup-outgroup cooperation-hostility axis model of moral language towards considering the Alignment of AI in LLMs is correct, or whether it has flaws. At this point no one has even breached this topic, and instead trying to peg me towards the duty of defending the research careers of an entire domain of scientists. That is, to paraphrase Newton, this work is standing on the shoulders of giants.

Of course, anyone with any common sense knows that, as a matter of fact “I don’t like Evolutionary Psychology” is logically equivalent to communicating “the real facts of human nature are troublesome to my personal ideological and/or economic interests.” But we can play pretend that there is genuine curiosity in the truth, sure, and so I can say that to ameliorate some of the histrionic concerns that political activists within tribes that particularly dislike the facts of human nature may have claimed about EvoPsych by reading this article.

Economics suffers from replication problems as well.

So? At roughly 50% it’s still much better than social psychology, and this figure matches Medicine, which is actually quite impressive for a social relative to a physical science.

Also it’s quite natural as first principles – AI is a producer or consumer good, and its application in the economy and its consequences is a fairly grounded understanding of its primary effects.

Compared to other social sciences, it is also quite well developed with its methodological toolkit: both statistics has enjoyed nice advances from Econometrics and micro and macroeconomic theories are both rich in modeling and simulation as well as comprehensive theoretical mechanistic descriptions. Indeed the History of Economic Thought is itself a rich and popular subject of study.

But again, while I make commentary of Meta-Science, observe that this commentary does not disparage any individual field as completely 100% illegitimate (even classic social psychology, I say it cannot be used to justify strong claims alone, not to throw it out entirely), it only suggests that 1) hesitation must be made as far as making definitive conclusions regarding intrusive interventions (such as medical treatments and economic policy interventions) and 2) using multiple levels of abstraction from multiple domains at once is a meta-method to arrive closer to the truth. Discounting an entire field altogether would be counter to my presentation on the Epistemology of Causality and, broadly speaking, rather arbitrary.

Retrospective

I gave a presentation summarizing these findings the other day, which yielded some interesting observations in subsequent commentary. There were some cases where it was very evident that some individuals did not understand some of the main concepts, and also others expressed that this polymathic enterprise is difficult to explain and communicate, but respect the ambition of the attempt. Now, on the one hand, this is at least partially on account of things I already know – I am quite far up the autistic spectrum and so would make an utterly atrocious Professor/Lecturer (I currently work as a Researcher with no teaching obligations aside from PhD supervision).

But even still, there is an element here that was disappointing – I don’t think this work is all that profound. Rather, I am just reviewing well established scientific facts and technical methodological frameworks and making straightforward applications of them to the language, concepts and ultimate problem of AI Alignment. Rather, I felt as though the work was a superficial entrance into the subject rather than anything too complex or even controversial, at least to someone literate simultaneously across the domains of AI, applied mathematics, Economics, Philosophy and Human Evolution.

There are a number of open problems I had been mulling that would be truly fascinating and deep, for instance modeling tactical language with noncooperative game theory and computational linguistics through enriched category theory. But my impression is that embarking on this would be a thankless job, if in general it’s not so well understood.

I asked Mr GPT on his thoughts on this matter. Recall that I didn’t get my ERC proposal funded and my h-index is only around 20, which implies that my knowledge and intelligence is perhaps only within one standard deviation of the mean (not being a Negative Nancy here, I am actively trying to implement relentless self improvement to go from being “slightly above average” to “good”). So this implies there must be some subpopulation of individuals with solid literacy and understanding of all of the subject domains mentioned above to engage with in communicating and collaborating towards research advancement in AIA that is based on solid ground rather than complete nonsense. This subpopulation... where are they?

Mr GPT’s answer got to the heart of the matter quickly: there are a number of mechanisms by which academic publishing strongly incentivizes narrow specialization and disincentivizes true interdisciplinary research. As such, most potential polymaths learn to constrain their interests and just focus on some narrow sub-sub-domain of expertise and publish there. The few true devoted polymaths end up just writing obscure books and blogs and are generally misunderstood and marginal.

This is actually terrifying on account of the stark necessity for research institutions to study significant and complex phenomena. AI Alignment requires what’s called “General Equilibrium” rather than “Partial Equilibrium” thinking. This means that one cannot focus in on one domain of analysis and just local perturbations of the system to consider the consequences of things – the standard assumption phrase “all else being equal” is never valid. Instead all of the components with mutual interaction patterns with the system have to be accounted for.

That said, Mr GPT said there is one exception in that the Santa Fe Institute hosts researchers for short term stays to perform multidisciplinary work, and so it could be good to pester them to see if I could have some fun on these matters at some point.

Otherwise though, this structural deficiency of interdisciplinary research, preventing seeing the big picture of complexity of societal systems, corroborates my earlier conclusions regarding civilization collapse – it’s a freight train not worth making an effort to slow or stop. Instead I should prioritize thinking for myself, as far as developing the skills to become sufficiently self-reliant and resilient to survive and simultaneously enjoy the decline, both relishing the twilight of civilization’s peak achievements as well as the absurdist chaotic thrill of decay and decadence.

In fact I recall a moment when Geoffrey Miller, a famous evolutionary psychologist, was once asked for professional advice by someone with an interest in the field. He said that given the nature of academia, including stiff competition for few positions and ideological groupthink, he recommends everyone to just apply the findings of evolutionary psychology for themselves, as far as managing their mental health and figuring out how to make the best decisions in the social world around them, rather than pursue a research career, unfortunately. In the same vein, my thinking is similar as an overall future plan, aside from perhaps an occasional visit to Santa Fe, I would focus on developing deep mathematical expertise, and otherwise use my penchant for “big picture systems thinking” for overall life strategy as well as technical inventions and innovations to bring to a real market, rather than scientific publications towards grand dysfunction in societal systems.

Philomatic Algorhythms

Discussion about this post