Saturday, April 27, 2013

Watson is Trivial (a crazy-man delusions-of-grandeur conversation with my friend Colt)

Hey Colt,

I believe that I could replicate something like IBM's Watson in a matter of days, coding in BASIC, on my laptop. So could you, though, and I have a party to throw tonight, and an English for Call Centers program to design by Tuesday. And you'd probably be better at it.

I'm going to cut right to the chase.

Wikipedia and the internet itself are semantic networks. Play Hitler Hops. Hitler is one degree away from Rommel, and 4 hops from toenails. Looking for a single, linear shortest path actually doesn't do justice to the way that concepts are linked. Hitler and Rommel have a bajillion links in common. This is all child's play to quantify. I say this because I'm pretty much a child, and it's the kind of thing I've done before.

This isn't my doing, but it's a cool picture.


The connections between concepts are pretty well mapped, at least in a subjective, human way (I imagine that if you had a pseudo-Watson drawing information from much of the web, it would associate alien abductions with government secrets and Justin Bieber with "fags"--this is maybe not the kind of cold objectivity you would want in a godlike personal assistant, especially if you believe that the masses are wrong about a lot of stuff).

Anyhow, I'm thinking about Watson's famous game. If I were writing a ghetto-rigged laptop-Watson, nothing I could ever produce could ascertain that "Chicks Dig Me" was a category about female archaeologists. But a pretty amateurish AI using a Wikipedia SQL dump and the connections therein would be kicking ass in the category after seeing the question. And it'd be lightening fast. I think I could still beat Ken Jennings.

My cousin and soul's-friend Levi took a Wikipedia SQL dump (haha, took a dump) with him when he went to Namibia for the Peace Corps. The download is surprisingly small. Lee's also done language-stuff with Wikipedia dumps when he was competing for the Netflix prize, so it's doable, processing-wise. Also, this has nothing to do with anything, but when he was in Namibia living in a mud hut with no internet connection and a solar charge, he wrote a blind chess AI that he took to the ICGA Olympiad in Pamplona and won the silver medal with, beating the Beijing University computer science team. I love that shit. It's like Iron Man inventing that suit in a cave.






It's like this: after we have the first question, we can isolate the words that fall under a certain threshold in a corpus of written English. I do this all the time to isolate domain-specific vocabulary for English for Special Purposes curricula.

Here are the 5,000 most common English words
. You'd want a different threshold than that (I'd start with 10,000), but we could tinker and find out where the bar needs to be. You can build a list like this using a few lines of Python and some Project Gutenberg texts, or you can just download a better one.

Now we have our question:


Kathleen Kenyon's excavation of this city mentioned in Joshua showed the walls had been repaired 17 times. 

Once we strip the common words we have:


Kathleen Kenyon, excavation, Joshua

If we were to look for the page with the strongest linkage, Hitler Hops-wise, to these three things, we'd probably (we would, I just ran a test) already have the answer "Jericho." And since our search can function by starting with the Wikipedia pages associated with these words and spidering out, we're not doing an ungodly amount of processing; we're not wasting our time combing through pages about the Andromeda Galaxy and snakes indigenous to South America.

But just in case we're not already to Jericho, Jeopardy questions give us an extra category key. In most questions, we look for the noun phrase that follows the word "this", or we look for the words "she" or "he", which indicate that the answer should be a person. (There are other kinds of Jeopardy questions, like fill-in-the-blank answers, which are pain-in-the-ass exceptions that have to be treated with their own algorithms, but thankfully finite.)

So we have:

Kathleen Kenyon's excavation of this city mentioned in Joshua showed the walls had been repaired 17 times. 

Nodes to spider out from: Kathleen Kenyon, excavation, Joshua. Category: city. The answer has to be a city.

Now this kind of categorization is something that I can't sketch a flow-chart for, but it's been done. If it requires a huge database and a lot of processing, its the Achilles Heel of my process, and this whole idea is wrong (unless we don't need the category).

I look at most questions, and this method seems to kick ass. I look at some (like the next question in Chicks Dig Me), and it fails:

This mystery author and her archeologist hubby dug in hopes of finding a lost Syrian city of Arkesh.
Hubby is rare enough to pass my threshold test, but takes gravity away from the semantic area we need to be in, and won't have an any better connection than random to the answer (Agatha Christie). (Thankfully, though, it also doesn't have a Wikipedia page, and we're saved from it by good luck.) Even if I could handle categories pretty well, I probably wouldn't have them down to a resolution that could handle "mystery authors" without having a fuckload of data and processing power. Worst of all, Arkesh doesn't exist anywhere on the internet. Like, seriously. And there are big pages about Agatha Christie and her archeology. It almost makes me think that the Jeopardy question is wrong. Whatever the case, Watson got it, and with linking from "archaeologist" and "Syrian" with no category, I'd be toast.

But for most questions, I'd still kick some ass.

The next one:

At the Olduvai Gorge in 1959 she and hubby Louis found a 1.75 million year old Australopithecus boisei skull. 
I'd kill it.

The next:

Harriet Boyd Hawes was the first woman to discover and excavate a Minoan settlement on this island.
Yep, I'd kill it.

So I'm wondering if this is a Pareto Principle thing where Watson needs three million dollars and a supercomputer and a team of PhDs working years--thousands of times the effort needed for my system--in order to pick up an extra 20% in accuracy, or if the problem is that they were incredibly silly, and their system is godlike at turning Jeopardy questions into database queries, and they have an awesome database, but they totally neglected to go for the low-hanging fruit of all the meta-textual semantic associations we've piled up in places like Wikipedia.

Whatever the case, if I can get a normal professorship where I get summers off, I'd like to try to make myself a Star Trek computer/Watson/Jarvis/HAL thing. Well, at least one that can sometimes answer Jeopardy questions. I mean, we'll all have them in a few years, but it'd be cool to be one of the first.

Thursday, April 25, 2013

Why Google Reader Is Really Shutting Down: A Frog's Eye View.

Google announced a while back that they were pulling the plug on Google Reader. Feedly got half a million sign-ups in the next 48 hours. I don't know what percentage of Google Reader users 500,000 people is, but if you have a commercial website, I bet you'd love to have and keep even that fraction.

What site with half a million (actually, probably several million) customers shuts down?

The one that isn't making money.

I remember Richard M. Weaver, an old mid-century American metaphysical philosopher who inspires me at some points and enrages me at others, saying in his Ideas Have Consequences that the stereopticon (if he was writing in the hipster vernacular he would have called it "The Man") tries to keep us dumb and miserable because the philosopher is a "notoriously poor consumer."

Smart people don't click a lot of ads.

With a market that big that wants something, and with people who can deliver, this sucks, and it seems like a tractable problem. Maybe they should charge for Google Reader? Nope. I wouldn't pay.

There's not enough genius, world-wrecking shit out there for me to pay. Because content producers, even ones like me who do it not for money but out of psychotic zeal, don't want to do it if no one is reading. And not enough people are reading. Because there aren't enough genius world-wrecking writers to produce enough content to support a product that they'll pay for.

Good writers and people who see the value of wisdom, insight, and poetry are too rare, and lost in the margins, for now.

People who sling real thought for a living are like old trannies doing their best in the circus as bearded ladies.

So we get philosophy-as-comedy with the likes of Carlin, or analysis-as-comedy with the likes of Jon Stewart. If you can latch onto a market that's alive, maybe someone who's listening will happen to be in the crowd. If the true religion ever comes, it will have to start as a hilarious joke.

I've discovered the secrets to life and the trajectory of civilization, but there's not a good subreddit to put it in. Lol. #fifthworldproblems.

Books are destroying everything.

Setting: the century following Gutenberg.
People are becoming losers. It's pathetic. Before, people talked to each other, but now they're all hunched over with their faces buried in these devices. Go outside. Play. Fuck.

It's like an addiction. It's like drunkenness. And it's everywhere. It was ok when people had one book, but now words are everywhere. Now, you go to the university, and you're given text. You go to the barbershop and it's on the wall, assaulting you with commercial offers. People are carrying these portable "newspapers" and you can see them just sitting there like zombies in the plaza, ignoring everyone around them, with their faces buried in words.

I don't know what's going to come of it. It seems like people are wasting their whole lives.


Every Luddite has a point. Obesity is a good enough reason to temper your relationship to little pages and screens.

But when I hear an abstainer, I worry. My life would be nothing without facebook. If facebook hasn't made you a completely different person with a vastly more fulfilling life, I should teach you how to use it (and you should sign up for Graph Search.)

Have we given up our humanity by redefining our relationships according to the constraints of the present technology? Sometimes. Let's work on changing the constraints.

Do we look like fools hunched over little screens all day? Yeah. Let's work on that too.

I (non-sarcastically) still haven't been convinced that taking in language through the eyes is an improvement over the way we evolved to take it in. So there are things on our civilizational to-do list, if we want to have our cake and eat it too. But for God's sake man, have some cake.

History, and money, and penicillin, were on the side of the people who let their kids learn to read.

Words That Shouldn't Exist (Part 1)

Sir Jonathan Erfe is a raging flaming amazon queen roman catholic linguist in possession of what is easily the sharpest mind I've communed with in an academic setting. He asked me to validate some of his research. In order to validate his research, I first had to be taught about linguistic aspect. He taught me. God it was a pain in the ass, but I learned something cool about linguistics, and something equally cool about the enslavement of humankind.

If I don't have a picture in a blog post, the preview doesn't look as cool on your feedly page. So here, have Lupe Mendoza's Uncanny Familiarities.

Verbs belong to aspectual categories. Aspect isn't explicitly marked in English, so you have to use tests to figure out which category a verb belongs to. For example, stative verbs are involuntary, non-activity things. One of the tests you can use to figure out if a verb is stative is whether or not it makes sense to command someone to do it.

“Hey man, start running! Now!” makes sense.

“Hey dude, want spaghetti! Now!” sounds weird.

(There are other tests to figure out other categories. If you stop in the middle of singing, have you sung? If you stop in the middle of buying a car, have you bought a car? These are kind of fun until your head hurts after doing it for eight hours and you start hallucinating and thinking that loving is an accomplishment and swimming is a state of being.)

So Sir Erfe's got this hypothesis that Filipino learners of English initially apply the progressive marker “-ing” to certain aspectual categories and not others. It's actually freaky weird subconscious stuff that makes you look at the sky and scream “WHY!?!?”, and that's my favorite kind of research.

So we're going through lists and marking verbs for aspect, and we sort of trip over a word: believe.

W-w-wait. Wait. No. I've got it. It's like “think”, man. Not like “I'm thinking”, but like “I think so.” I believe so. Creer, croir, credo, whatever. Slam dunk. Right? Come on.

The creeds.

I think that there's one God, the Father, the Almighty . . .
I think that there's one Son . . .

Oh my God. (Wait lang.) There's something fishy happening here.

Motherfuckers did it again.

I've added “believe” to my List of Words that Shouldn't Exist.

Hopi Indians can't tell yellow from orange. Give them a stack of five colored cards and tell them to memorize the sequence, and they won't put yellow and orange in the right places any better than chance would have it. It's because they don't have separate words for yellow and orange. Words structure our brains and constrain our array of possible thoughts. (This is called the Sapir-Whorf hypothesis, and it's mainstream, and the reason for the drive behind salvage ethnography and language revitalization: there might be ways of thinking out there that are closed off to us, and letting languages die off could mean throwing away the meaning of life and the cure for cancer.)

Marshall Rosenberg says that when civilization was invented and we all got enslaved, our languages actually got messed up to make us better slaves. I buy it. (And you, my friend, should buy Nonviolent Communication.)

So, dumb kid that I am, I'm keeping a list.

Words that Shouldn't Exist. 


  1. Should. - (Do please read on before you send me emails about moral decline and impending holocausts.) I like should, in some contexts. "If you really want to make money, buddy, you should get into computational statistics." See now? I like that just fine. That's wonderful. Should tells us that Thing A is required for Thing B. That's a pretty damned useful word. Sometimes we don't even have to say Thing B, because it's implied. "Dude, you should ask for her number!" . . . because I bet she'll give it to you . . . and then you'll go on a date, and you'll both be happy and life will be fun. Cool.

    But sometimes the objective isn't implied. This is the ancient mind control twist on should. Children should be seen and not heard, etc, whatever. Putting some amorphous oughtness in the air that puts demands on people is a great way to make slaves out of your wife and children and adherents to your new religious cult. It would sound like babbling crazy talk to someone whose language hadn't already been doctored. "Children should be seen and not heard because I'm embarrassed of them in front of my friends." Now that's a healthy sentence. Oh, but now it's also my fault. And it also kind of sounds mean. Killing the word should will do that. A lot of time it gets replaced with "I want." Accountability can be uncomfortable. You should try it. It can make all your relationships blissful, if you've got the balls for it. (Again, seriously, buy NVC.)

    Wait, am I saying should shouldn't exist? Yeah. Good point. What I mean is that I think if you kill this word you'll be happier.
  2. Ought/need to/must - See Should.
  3. Believe. Where the Spanish say "believe" (creo), we say "think." (It gets sloppy because we have two thinks, one for pondering and one for believing and opining. 'I think the meeting is on Tuesday." It's stative. It's involuntary. You don't say "Think that I'm pretty! Now!" It would be convenient if you could. That's why they invented believe.

    To an un-tortured brain thinking in an un-doctored language, putting a moral should on what someone thinks is true is nonsensical and comical. People can't change what they think is true. They can with new evidence or pondering, but they can't (without serious trauma) just voluntary decide that there are three lights on the wall and not four. Civilization (being that near-simultaneous onset of church, state, school and factory) did change the way we think (yes, though some serious trauma). It  also came up with a word for the new way we were supposed to "think": believe.

    Believe is an "I think so" that you're supposed to be able to control at will. To believe some things is wrong, and believing others is right. It's the criterion for salvation in some creeds. Hellfire awaits for those who don't have the right think so.

    (*Note, the romance languages didn't dodge the bullet: they have a casual credo and a doctored credo in just like we do.)
  4. Phone. - Wuhahaha. Yeah, words blind us to the technological singularity and supremely fuck up our monetary inflation metrics. But that's for another post.

    18DEHVMSfKHJyLmoKEcrorVKpJnzv1DbTJ

Saturday, April 6, 2013

Bitcoin and Esperanto

In 2009 1887, a man working under the pseudonym Satoshi Nakamoto Doktoro Esperanto, unleashed an idea, simple and complex, a feat of genius, which stood an unsettlingly real chance of changing the world, making us more prosperous, defying the the very conception of the nation-state, and ending all wars.

It was a weird era, in a lot of ways. And my favorite. It was a lot like the present time.

During the ascendancy of Esperanto, in an anarchist revolution taking place in Spain, centered in a city full of weird futuristic surrealist architecture, this Esperanto propaganda poster was made. Yeah, I was born too late, or too soon, or something.


Esperanto is a constructed language that's ridiculously easy to learn. It's made out of the common word roots from big European languages, with more novel, perfectly regular, logical grammatical affixes.

Let me showcase it for just a second:

I love her.
Mi amas ŝin.

She loves me.
Ŝi amas min.

I will love her.
Mi amos ŝin.

She will love me.
Ŝi amos min.

No conjugation for person. Change the "a" to an "o" to make the future. Change it to an "i" for the past. You've been studying Esperanto for 30 seconds, and you already have a better command of the grammar than you might after months of studying Spanish. That's the point.

Imagine a language you can get conversational with in the space of a couple months. Imagine everyone on earth took a couple months to learn it.

Imagine that by this autumn you could understand Iranian newspapers and soap operas. I have a feeling that Persians would start to become so eerily human to us that the thought of going to war with them would start to feel as horrible and insane as, I don't know, going to war with England.

Tons of people got behind Esperanto. The anarchists started publishing journals in it. Smart people started saying that it would take over the world in a small amount of time, and they weren't crazy. It looked that way.

A lot of tragedy and poetry interfered. The Nazis didn't like it. Doktoro Esperanto, was, in fact, a Jew. Nationalism got big in Europe in the first half of the 20th century. It sucked.

Dr. Esperanto's daughter, Lidia, was an early convert to the appropriately universalist and synchretic Baha'i faith. Taught both the language and the faith. Died in the Treblinka extermination camp.

Esperanto's exponential trend leveled off and stayed put. The internet is connecting would-be Esperantists and maybe giving it another chance, but smart people are no longer writing editorials in newspapers saying it will be the world language in 10 years.

What happened?

Lots of stuff.

What's my point?

Just because something is easy, would benefit mankind, is ascendant, is blowing up in the news, is logical, and would make everyone richer doesn't mean it'll work out.

There is no teleological attractor driving all things towards perfection. Or if there is, it's a weird one that tolerates setbacks and holocausts. Sometimes there's something good and rational and people still don't adopt it.

Nothing is certain.

But I do prefer to hang out with Those Who Hope.

*Dr Jim LaPeyre, this isn't meant to be the bitcoin post you asked for. This is a different one.

Friday, April 5, 2013

Resource allocation. And paella.

The population of earth is about 7 billion people.

About half of them are the gender you're attracted to. (Unless you're bi.)

3,500,000,000.

So I don't have to actually do research, let's do back-of-the-envelope kludgey math to figure out the age stuff. 65.9% of the world is between the ages of 15 and 64. That's 49 years. Let's call it 50. Let's pretend that the ages within that bracket are distributed evenly. Let's assume that you're in that age bracket, and you're roughly attracted to people within five years of your age, putting a 10-year bubble around you. So there are 461,300,000 people of the right gender and age.

Let's define the word "gorgeous" as 1-in-100 cute (46,130,000).

Let's say that in this huge crazy world of monkeys and typewriters, for any given person's psychological problems and character flaws, around 1 in a million people finds themselves attracted to exactly that (like "man, obesity in effeminate men really turns me on.")

By this awful made-up math, even if you're hideous and annoying and gross, there are 46 gorgeous people who are literally insanely attracted to you.

The world is an amazing place.

Every time you have ever had a freakish, desperate craving for paella, someone somewhere was right in the middle of saying "Shit, I accidentally cooked too much paella. Why do I always do this? I hate when things go to waste."


Every time you've had to go from Dallas to Nashville, there was someone else making the same trip that day, with an extra seat.

When I went from Manila (a metropolitan area of 20 million people) to Hong Kong/Macau (15 million) for a conference, there were certainly several people making a mirror trip who would have loved to swap condos, eat from each other's refrigerators, and pay nothing.

There's someone dying to hire someone exactly like you.

Of the 130 million books that have been written, there's one that would blow your mind and change the course of your life more than M. C. Escher's visit to the Alhambra did his.

Every facet of your paradise exists, but you will never make the connections.

We live our lives by chance, picking the major that our local college offers, listening to the music that our friends showed us, and dating the girl we met at the bar, letting most allocations happen by near-chance.

I've probably been a lot more deliberate than most people. I wrote a document classifier that accurately predicts which HN articles I will like. There are layers of automated search and analysis underlying my job hunt, social life, and quest for meaning. But there needs to be much more. For me and for all of us.

I will never hear what would be the most amazing and powerful song I could ever hear. I will never meet my best friend. And that makes me sad.



18DEHVMSfKHJyLmoKEcrorVKpJnzv1DbTJ

Thursday, April 4, 2013

Transporter phobia, man as pattern, and all things being subsumed into the neo-Platonic oneness

When someone gets beamed aboard the Enterprise, nobody acts like there's been a murder.



Someone has been dissected, atom-by-atom, and annihilated. I don't know what happens to the matter. Maybe the matter itself gets sent to the Enterprise, where the person is re-constructed. But I always figured that they got put back together using matter that was already on the ship, food replicator style.

But the matter doesn't matter. Nobody thinks it does. You are not your particles. The matter in your body falls off and gets replaced all the time. Pork chops turn into eyelashes. Nobody cares. What we value in a person isn't the stuff, but the signal. Man is pattern.

I think that this is obvious past arguing, but I've seen at least one philosopher take pains to embarrass himself getting it backwards (whilst being destroyed by the ever-correct-but-needing-more-fake-humility Eliezer Yudkowsky).[1]

Sidebar comment:
This one time, when I was like
thirteen, I hacked a web server
in California, where this one
dude was keeping a sort of
on-line journal, and he had this
flash fiction piece (actually a
"movie idea") where the whole
world started using transporters
but they really took away your
soul, and there was one guy left
on earth with a soul. I was all
like "dude! THIS is why I'm a
hacker." It was cool. Then I got
banned from my ISP.
Mind uploading (which we're due for this century, if brain-scanning resolution, data storage, and processing power stay on their exponential tracks) works on the same principle as the Star Trek transporter, except with emulation rather than meatspace reconstruction. A model is made of where and how all your neurons are (or atoms, or whatever; whatever resolution you think is necessary), and a computer plays physics and lets things move around and fall where they would, and the product is your digital brain thinking thoughts. (I'm going to forget to mention the implications about free will and cosmological determinism here because there are mean scary people on the internet.)

Most transhumanists are ok with destructive mind-uploading, just like most Star Trek fans are ok with eradicating the instance of the person at the transportation origin. Destructive uploading kind of bothers me, but nails on the chalkboard bother me too, because of some vestigial baboon scream panic thing. Vestigial ape logic is a good enough reason to not take a grill brush to a chalk board (sweet Jesus, I can't believe I even typed that. Yipe, help, shudder), but I wouldn't pass on immortality for it.

Oh. Em. Ef. Jee.

So.

Assertion #1:

It doesn't matter if you destroy Commander Riker so long as you make an exact copy.

Now let's talk about something else.

Ten years ago I was a gay-condemning George Bush Republican seminarian. I'm now a polyamorous pescetarian anarchist. I've changed a lot. My pattern has changed a lot. I didn't die, and most people didn't mourn. The five-year-old you is dead, but it's ok. The process of incremental change that links the old you to the new you makes everyone identify the two people with each other. Without that process, we'd be very upset. If the five-year-old you was eradicated and we conjured a new personality at random to give to your parents, they would be pissed. But we can take you from being that five year old to being any end-person, and so long as we have the connecting transition--a process by which this one person became the other--your parents will love the randomest of persons as you. (And again, I have to fight going on a tangent. This time about the illusion of self and Eastern stuff.)

So.

Assertion #2:

It doesn't matter if we destroy you so long as there's a process by which we transition the old you into a new you. That's not death, that's life, and change, and a cool thing.

Ok. So now let's talk about something different.

It's 2162, and humanity has been uploaded. People started out being really different from each other, but in a lot of ways we're getting more and more alike. We've changed our mind architectures, and our minds are no longer just simulated ape brains. They're silicon-based, and really really fast, and that has the effect of making one year of the earth going around the sun enough time for a ton of thinking and changing - subjective millennia.

There was one guy who spoke Latin, and everyone thought that was cool, so they all installed the Latin file. And the Sumerian. And Proto-Austronesian. One guy, back in the meatspace days, had weird intrusive thoughts about dead bodies. Something had been wrong with his brain. It sucked. It made it hard for him to sleep. So he changed it. When he uploaded, he changed his mind to get rid of that, and he also got rid of some weird sexual shame issues that his early-childhood conditioning had given him.

People are starting to be a lot alike. Everyone knows the same stuff (everything that is known) and everyone can do the same stuff (everything that can be done). It seems like this is having the effect of making everyone believe the same stuff and want the same stuff. Since memories can be shared, and after a while everyone shared everything with each other, everyone remembers the same stuff.

We're starting to wonder if we could actually be considered copies of the same person. If my mind-file was deleted, it's less than 0.01% different from everyone else's, so I wouldn't feel very much like I had died. Hell, in the meatspace days, we called losing that much of your mind-file "a good night at the bar".

---===---

Millions of years on, the human mind saturates the galaxy. All is computronium. There is one mind, one body, one universe. The universe is mind. The knower is the known. "I" is everything, and nothing.

 ---===--- 

Ok, back to now.

I don't know what's true or what's of value or what's really going to happen.

But,

Playful Assertion:

Immortality is identification of one's own pattern with the supreme pattern. Connecting your pattern to the ultimate pattern through some gradual process, (i.e. living long enough to join the borganism) would sure make it feel a lot less like death though, since we seem to really like that connecting part.

And this all sounds like Plotinus. 

The end.

P.S. If you actually like this crazy-ass rambling shit, you should RSS me. If you don't already use an RSS reader, you should. You just get on feedly.com and start and account and paste my blog URL in, and then I get added to your own bad-ass personal magazine thing. You should also add the OkCupid blog, because that thing is freaking fascinating.

Also, I will give you conversations, naked pictures, or gratitude for bitcoins: 18DEHVMSfKHJyLmoKEcrorVKpJnzv1DbTJ