Anthropic has discovered how to peek inside AI/Anthropic ha descubierto cómo asomarse al interior de la IA

Source
I remember just five years ago, when artificial intelligence was only dedicated to classifying dogs and cats, at that time a mortal like me could still do some experiments with Google Colab, such as training a small model applied to a camera to recognize the face of the boss and let us know when he approaches.

Recuerdo hace apenas cinco años, cuando la inteligencia artificial se dedicaba solamente a clasificar perros y gatos, en aquel tiempo un mortal como yo aun podía hacer algún que otro experimento con el Google Colab, como entrenar un pequeño modelo aplicado a una cámara para reconocer la cara del jefe y que nos avise cuando se acerque.

Just as it happens with children, while they are at school you can help them with their homework and even leave them impressed, but when they enter university their parents no longer understand what the hell they are talking about, artificial intelligence has become so complicated so it has become black boxes that even their creators cannot understand.

Al igual que sucede con los hijos que mientras están en el colegio se les puede ayudar con los deberes e incluso dejarles impresionados, pero cuando entran a la universidad sus padres ya no entienden de que carajo están hablando, la inteligencia artificial ha ido complicándose de tal modo que se ha convertido en cajas negras que ni sus creadores pueden comprender.

Source
We will hardly be able to protect ourselves from the adverse effects of AI if we are not able to know how it reaches the conclusions it reaches and makes the decisions it makes, which is why AI expert Chris Olah, co-founder of the startup Anthropic, leads a team of researchers with the who sets out to decipher what happens inside the brain of the beast.

Mal podremos protegernos de los efectos adversos de la IA si no somos capaces de saber como llega a las conclusiones que llega y toma las decisiones que toma, por eso el experto en IA Chris Olah cofundador de la startup Anthropic dirige un equipo de investigadores con el que se propone descifrar que es lo que ocurre dentro del cerebro de la bestia.

During the past year, the researchers began their experiments with a very small AI model that only uses a layer of neurons to discover patterns and little by little they used models with more layers until they tried it with a medium-sized version of their language model named Claude.

Durante el pasado año los investigadores comenzaron sus experimentos con un modelo de IA muy pequeño que tan solo utiliza una capa de neuronas para poder descubrir patrones y poco a poco fueron utilizando modelos con más capas hasta llegar a intentarlo con una versión de tamaño mediano de su modelo de lenguaje llamado Claude.

Source
After several failed attempts, it began to associate neural patterns with concepts that appeared in its results. At this point, the Anthropic team went a little further to see if they could use that information to change the behavior of the Claude model by increasing or decreasing certain concepts.

Después de varios intentos fallidos empezó a asociar patrones neuronales con conceptos que aparecían en sus resultados, llegados a este punto el equipo de Anthropic avanzó un poco más para ver si podían utilizar esa información para cambiar el comportamiento del modelo Claude aumentando o disminuyendo ciertos conceptos.

With this experiment, the researchers found several characteristics that represented potentially dangerous practices, such as insecure computer code or instructions for manufacturing dangerous products.
There is no doubt that, if it works, this is better for our security than all the regulations that states can prepare.

Con este experimento los investigadores encontraron varias características que representaban prácticas potencialmente peligrosas, tales como código informático inseguro o instrucciones para fabricar productos peligrosos.
No cabe duda de que, si funciona, esto es mejor para nuestra seguridad que todas las regulaciones que los estados puedan preparar.

More information/Más información
https://www.anthropic.com/news/mapping-mind-language-model

https://www.infobae.com/america/the-new-york-times/2024/05/22/opinion-las-cajas-negras-de-la-ia-ya-son-un-poco-menos-misteriosas/