11.23.2020

REPOST: How to create a crowdsourced and gamified search engine

Original archive link

*This blogger is hosted by google and they literally scrubbed this page from my blogger and the internet!*

Google... they have an utter and seemingly unbreakable monopoly on search.  The only problem is that no one seems to care that much.  Just as long as we can maintain some sort of anonymity, perhaps just not log in when you search or using private browsing we think it is good enough.  Besides search is boring and no one wants to waste their free time thinking about how to make a better search engine.  None of the encrypted or anonymous search engines seem any good.  And they aren't.  But the thing that we have to realize is that neither is google.  Amazing search we have never experienced and we have no idea how amazing it can be.  Keep reading and you will learn how to achieve a search engine that not only gives you literally double the results you are looking for but also instantaneous new content that would take google days or weeks to finally get searchable.


Google is like Encarta.  It seemed amazing when it was in its heyday but once Wikipedia and its billions of pages came around did we realize how limited Encarta really was with its mere tens of thousands of entries.  Google is slow.  Google is incomplete.  But how can this possibly be?  Well the internet isn't just automatically searchable.  Everything that you search has to be indexed by web crawler bots.  Every search engine has to hire these bots to start finding things on the internet for them.  These bots feed website data to a sites database and it is the database that google searches in 0.0002 seconds, not the internet itself.  So if googles bots don't harvest website data far in advance of your search, then you won't find anything.  

Now this web crawling is tough business.  Everyone thinks that the "secret sauce" of google is it's search algorithm.  Basically what that is is the rules of how google orders relevant search results.  This isn't it.  The algorithm is easy.  An algorithm isn't even really needed at all, a simple boolean search is even more powerful and will help you find what you want even better.  So why is google better then?  Its the web crawlers.  Other search engines simply don't have as much content in their database to search compared to google.  So it isn't about algorithm at all, it is about botting.  Google has the botting power.  They likely spend billions of dollars a year in resources to fund their massive constant web crawling campaigns to map out the web.  But like I said this isn't nearly enough.  To really do it well would cost trillions of dollars a year.  And you wonder why yahoo gave up and just uses bing results?  it wasn't because they were too stupid to come up with a good algorithm, they simply couldn't keep up in the botting arms race.  Bet you have never heard of any of this have you?  The industry keeps this a closely guarded secret.

Google only searches about 40-70% of the internet and not only that but their search results are days to weeks old and don't take into account recent website changes.

So how could google possibly improve?  They can't put enough ads on their results to get trillons of dollars a year (but boy do they try!  Half the page is ads now!).  If Google can't even do it then how on earth can anyone compete when all they have is a single bot that will take thousands of years to index the whole web?  Easy.  Crowdsource it.  No one could have imagined competing with encarta and their tens of thousands of articles, especially with no money and no manpower.  But along came Jimmy Wales and he upended encarta overnight by just providing people with one thing...a blank page.  He called it wikipedia.  And with no money and no man power the collective beat out the monopoly...and beat them in spades.  So badly it put encarta out of business for good with no shot whatsoever.

So how can we crowdsource a search engine?  How can we encourage people to use their botting power to help us?  Ever heard of bitcoin?  Bitcoin rewards people for using their botting power to scramble data and verify transactions.  The scrambling data thing always made me curious why all these trillions of dollars in botting resources were being used to mess up data when they could have them do something useful.  But if Bitcoin were created by the NSA for example then the NSA could be using all that bot power to be a digital paper shredder for all the spying they are doing on us, to cover up their tracks.  Or mabye the global banking cabal is using bitcoin scrambling data to cover their tracks when trillions of dollars go missing from the US or Federal reserve or one of their other assets.  There is a reason Ron and Rand Paul have never been able to audit the Fed, we don't own the Fed the Fed owns us.  Borrower is slave to the lender.

Anyway the answer is simple, reward people with bragging rights: give them a unique code for crawling the web for us and contributing their findings to an open source database.  It is kind of like a digital trophy.  They can use a php form to submit url data entries to our database themselves.  When you are having millions of people contributing it must be open source, people aren't going to work for you for free, they are only working for themselves.  For the pride of getting their findings in an open source search engine and/or gain a digital trophy to brag to their  friends with.  One way to have a database open source is to perform hourly or so database dumps and have those available to the public to download.  That way their can be competitors with mirror search engines that will have virtually the exact same search results as your search engine they are just an hour or so behind.  Also these mirror search engines can fork off if they want.

So how do we make this work?  Here is my take but there can be other ways.  Once a user submits a URL and associated metadata and/or full text index; a couple things need to happen.  First we need to verify that the url they are submitting isn't already in our database.  If it isn't then we check to make sure their submission is correct (we need to crawl the url ourselves and double check) and then we can let their submission stand (and generate them a shortened url for that url, we will get back to this).  Or if the url is in the database already then we need to see if the current entry is exactly correct.  If it is not exactly correct (which means the site has been updated) then we need to verify that the new submission is exactly correct then we can let it stand.

Once this is done a new shortened URL is given back to the person who just now submitted it (which means the original url contributor is now cut off from the future of the entry...or their submitted data is tied to a historical snapshot of the site) as well as a passkey to access the digital trophies that the shortened url generates.  All traffic generated by the search engine that pertains to that url will go through the shortened url.  This means we can track how much traffic is generated.  For every so many hits then you can award that person so many trophies.  For example every 1000 hits they get 1 Glow trophy OR for 10 hits they get a gleam trophy.  For example they could choose whether to cash out 30 hits for 3 Gleam trophies or keep saving up for a Glow trophy.  You could encourage people to not cash out until they get a Glow by saying if you get some lesser number of hits say 750 or 900 they can get a Glow trophy.  Yet they could convert thier glow trophy for 100 gleam trophies effectively helping them reap a greater profit.  To claim the trophies they login with their shortened url and passkey and can see how many trophies they have earned and can claim them by viewing them.  Preferably they would have their own spreadsheet of all their shortened url's and passkeys.  When they login to any one of their shortened url's they can click "view trophies" the trophies are created but you can only view the codes temporarily.  Once they view their trophies the codes quickly disappear from sight so the person must copy them and paste them in a document like a spreadsheet that they can save.

You could request that the password (passkey) of the shortened url be changed (and emailed OR viewed instantly) if you know your current passkey.

A trophy is simply a key code.  It doesn't have an account or necessarily a person it belongs to.  It is just a trophy.  Someone could perform a random key change for their digital trophy's code if they don't like the look of their current trophy code (or batch of trophies) if they want and have the new one emailed to them OR viewed instantly.

Alternatively you could make your reward just be a form of forum currency for people to swap that they are not allowed to do any RMT (real money trading) with.  Or you could make a cryptocurrency like bitcoin.  But personally I just like the simplicity and elegance of a digital trophy, it just works.

Now that could work.  The only problem is that it seems there is a lot of double checking that needs to happen.  While this is leveraging the botting power of the internet and our double checking effort should be insignificant compared to the enormous difficulty of finding new URL's, it still requires a lot of work.  Perhaps we could leverage millions of dollars worth of botting power to be as effective as trillions of dollars worth but still we don't have millions of dollars.  What we could do is just allow people to have free edit over the database.  I don't want to make wikipedia's mistake by allowing anyone to delete stuff, so I think it should be 'add only'; each person "owns" the url they added to the database and only they can edit the associated metadata/full text index.  They would get trophies for the traffic their url generates so it is in their best interest to keep the entry updated so they generate as much traffic as possible.  Alternatively we could go full wiki and allow people to only add url's and not delete them but allow anyone to change the associated metadata/index.  Now it would be hard to give trophies for anything in this scenario because if the person who submitted the url (analogous to the person who created a wiki page on Dwight Schrute) is given trophies for how many hits they get, it would be in competitors best interest to saboutage the metadata/index for that url so they can reduce the amount of trophies the url submitter is earning.  But it could still work by allowing people to submit entries for free and get rid of trophies like wikipedia does.

Alternatively we could stick with the method of only the person who submits the url can change the data associated with that url.  Then we could allow people to petition to claim a url someone else already claimed by proving they actually own the url themself.  Also they could petition to claim the url by proving the current data in the database associated with the url is innacurate or significantly outdated for more than a week or so.  These two scenarios are pretty much the same, the person could prove they own the url by changing the page's metadata and showing the current entry is outdated.  They could document that at a certain datetime the metadata of a page is different than the current search result in the search engine.  If the true owner of the url is booted by letting their database entry become outdated, they can reclaim the url again by proving they own it again. This will keep people on their toes and continue to update the database as the site is updated.

The key will be for people who don't actually own the domain but own the current entry in the search engine; to cash out their trophies as soon as possible so that if the true owner claims it they have already taken out all the trophies.  Otherwise the new owner will be able to gain all the trophies the previous owner earned.  It can be kind of like a game to raid and gain trophies.  It is gamification of a search engine.

In any of those cases when someone else proves that the current owner of the url entry isn't keeping it accurate, then the shortened url associated with the url would be changed - and the passkey to access the shortened url's unclaimed and future trophies would also be changed and given to the new person who is now the new owner and sole editor of that entry.

I think that latter case is the best case for us right now.  It is a little sad that only one person is allowed to update an entry; but unlike wikipedia where articles can be miles long; this is just metadata and/or a full text index which are more objective and verifiable facts.

Stay tuned for more discussion on this topic and find the "follow by email" box on this page to get an email whenever I post new content.  Thanks!  
-NatureHacker

No comments:

Post a Comment

Thank you for your feedback! Sharing your experience and thoughts not only helps fellow readers but also helps me to improve what I do!