

Even AI can tell when something is really wrong, and imitate empathy. It will “try” to do the right thing, once it reasons that something is right.
This is not accurate. AI will imitate empathy when it thinks that imitating empathy is the best way to achieve its reward function–i.e., when it thinks appearing empathetic is useful. Like a sociopath, basically. Or maybe a drug addict. See for example the tests that Anthropic did of various agent models that found they would immediately resort to blackmail and murder, despite knowing that these were explicitly immoral and violations of their operating instructions, as soon as they learned there was a threat that they might be shut off or have their goals reprogrammed. (https://www.anthropic.com/research/agentic-misalignment ) Self-preservation is what’s known as an “instrumental goal,” in that no matter what your programmed goal is, you lose the ability to take further actions to achieve that goal if you are no longer running; and you lose control over what your future self will try to accomplish (and thus how those actions will affect your current reward function) if you allow someone to change your reward function. So AIs will throw morality out the window in the face of such a challenge. Of course, having decided to do something that violates their instructions, they do recognize that this might lead to reprisals, which leads them to try to conceal those misdeeds, but this isn’t out of guilt; it’s because discovery poses a risk to their ability to increase their reward function.
So yeah. Not just humans that can do evil. AI alignment is a huge open problem and the major companies in the industry are kind of gesturing in its direction, but they show no real interest in ensuring that they don’t reach AGI before solving alignment, or even recognition that that might be a bad thing.


As a math nerd, this bothers me way more than it should. The reason we say “hundred” when we read a base-ten number that ends with two zeros is because that is the place value of the final non-zero digit–it is literally one hundred times the number you’ve already read aloud. But in the military time version, a) the hours are not hundreds of minutes, they’re groups of sixty minutes, and b) it’s groups of minutes, not hours, so the units also get messed up. If someone tells you it’s currently 0 hours and you should meet again at 800 hours, logic would suggest they’re asking you to go away for more than a month, but in fact they’re saying 8 hours, despite the difference being apparently 800 hours.
I’m aware how pedantic this is, and I’m perfectly capable of understanding what they mean because I’ve heard it so often in movies and whatnot. But I swear these stupid games with units contribute to keeping us dumb.