Hacker News

HackerNews

New Best Show Ask Jobs

130

Tversky Neural Networks

(gonzoml.substack.com)

by che_shr_cat

5 Comments

created 2 days ago

Lerc | 1 day ago | 1 Comment

It seems a bit much to stick a Proper Noun in front of Neural Networks and call it a new paradigm.

I can see how that worked for KANs because weights and activations are the bread and butter of Neural networks. Changing the activations kind-of does make a distinct difference. I still thing there's merit in having learnable weights and activations together, but that's not very Kolmogorov Arnold theorem, so activations only seemed like a decent start point (but I digress).

This new thing seems more like just switching out one bit of the toolkit for another. There are any number of ways to measure how a bunch of values are like another bunch of values. Cosine similarity, despite sounding all intellectual is just a dot product wearing a lab coat and glasses. I assume it is easily acknowledged as not the best metric, but really can't be beat for performance if you have a lot of multiply units lying around.

It would be worth combining this research with the efforts on translating one embedding model to another. Transferring between metrics might allow you to pick the most appropriate one at specific times.

heyitsguay | 2 days ago | 2 Comment

Seems cool, but the image classification model benchmark choice is kinda weak given all the fun tools we have now. I wonder how Tversky probes do on top of DINOv3 for building a classifier for some task.

Show Reply 2 [+]

dkdcio | 2 days ago | 3 Comment

> Another useful property of the model is interpretability.

Is this true? my understanding is the hard part about interpreting neural networks is that there are many many neurons, with many many interconnections, not that the activation function itself is not explainable. even with an explainable classifier, how do you explain trillions of them with deep layers of nested connections

Show Reply 3 [+]

roger_ | 1 day ago | 1 Comment

Interesting, can this be applied to regression?

tpoacher | 1 day ago | 1 Comment

Fools. Everybody knows a TLA (three-letter acronym) is instantly more marketable than a two-letter one (also abbreviated TLA, but we don't talk about Bruno and all that jazz).

You should have called it the Amos-Tversky Network, abbreviated ATN. An extra letter instantly increases the value of the algorithm by three orders of magnitude, at least. What, you think KAN was an accident? Amateurs.

Now you just sound like you're desperately trying to piggy-back on an existing buzzword, which has the same feel as "from the producer of Avatar" does.

Everybody knows a catchy name is more important than the technology itself. The catchy title creates citations, and citations create traction. And good luck getting cited with a two-letter acronym. Everybody knows it's the network effect that drives adoption, not quality; just look at MS Windows.

What. You think anyone gave a rat's ass about nanotechnology back when it was still just called "chemistry"?

Source Code