Original archive link
*This blogger is hosted by google and they literally scrubbed this page from my blogger and the internet!*
Google... they have an utter and seemingly unbreakable monopoly on
search. The only problem is that no one seems to care that much. Just
as long as we can maintain some sort of anonymity, perhaps just not log
in when you search or using private browsing we think it is good enough.
Besides search is boring and no one wants to waste their free time
thinking about how to make a better search engine. None of the
encrypted or anonymous search engines seem any good. And they aren't.
But the thing that we have to realize is that neither is google.
Amazing search we have never experienced and we have no idea how
amazing it can be. Keep reading and you will learn how to achieve a
search engine that not only gives you literally double the results you
are looking for but also instantaneous new content that would take
google days or weeks to finally get searchable.
Google is like Encarta. It seemed amazing when it was in its heyday but
once Wikipedia and its billions of pages came around did we realize how
limited Encarta really was with its mere tens of thousands of entries.
Google is slow. Google is incomplete. But how can this possibly be?
Well the internet isn't just automatically searchable. Everything that
you search has to be indexed by web crawler bots. Every search engine
has to hire these bots to start finding things on the internet for them.
These bots feed website data to a sites database and it is the
database that google searches in 0.0002 seconds, not the internet
itself. So if googles bots don't harvest website data far in advance of
your search, then you won't find anything.
Now this web crawling is tough business. Everyone thinks that the
"secret sauce" of google is it's search algorithm. Basically what that
is is the rules of how google orders relevant search results. This
isn't it. The algorithm is easy. An algorithm isn't even really needed
at all, a simple boolean search is even more powerful and will help you
find what you want even better. So why is google better then? Its the
web crawlers. Other search engines simply don't have as much content
in their database to search compared to google. So it isn't about
algorithm at all, it is about botting. Google has the botting power.
They likely spend billions of dollars a year in resources to fund their
massive constant web crawling campaigns to map out the web. But like I
said this isn't nearly enough. To really do it well would cost
trillions of dollars a year. And you wonder why yahoo gave up and just
uses bing results? it wasn't because they were too stupid to come up
with a good algorithm, they simply couldn't keep up in the botting arms
race. Bet you have never heard of any of this have you? The industry
keeps this a closely guarded secret.
Google only searches about 40-70% of the internet and not only that but
their search results are days to weeks old and don't take into account
recent website changes.
So how could google possibly improve? They can't put enough ads on
their results to get trillons of dollars a year (but boy do they try!
Half the page is ads now!). If Google can't even do it then how on
earth can anyone compete when all they have is a single bot that will
take thousands of years to index the whole web? Easy. Crowdsource it.
No one could have imagined competing with encarta and their tens of
thousands of articles, especially with no money and no manpower. But
along came Jimmy Wales and he upended encarta overnight by just
providing people with one thing...a blank page. He called it wikipedia.
And with no money and no man power the collective beat out the
monopoly...and beat them in spades. So badly it put encarta out of
business for good with no shot whatsoever.
So how can we crowdsource a search engine? How can we encourage people
to use their botting power to help us? Ever heard of bitcoin? Bitcoin
rewards people for using their botting power to scramble data and verify
transactions. The scrambling data thing always made me curious why all
these trillions of dollars in botting resources were being used to mess
up data when they could have them do something useful. But if Bitcoin
were created by the NSA for example then the NSA could be using all that
bot power to be a digital paper shredder for all the spying they are
doing on us, to cover up their tracks. Or mabye the global banking
cabal is using bitcoin scrambling data to cover their tracks when
trillions of dollars go missing from the US or Federal reserve or one of
their other assets. There is a reason Ron and Rand Paul have never
been able to audit the Fed, we don't own the Fed the Fed owns us.
Borrower is slave to the lender.
Anyway the answer is simple, reward people with bragging rights: give
them a unique code for crawling the web for us and contributing their
findings to an open source database. It is kind of like a digital
trophy. They can use a php form to submit url data entries to our
database themselves. When you are having millions of people
contributing it must be open source, people aren't going to work for you
for free, they are only working for themselves. For the pride of
getting their findings in an open source search engine and/or gain a
digital trophy to brag to their friends with. One way to have a
database open source is to perform hourly or so database dumps and have
those available to the public to download. That way their can be
competitors with mirror search engines that will have virtually the
exact same search results as your search engine they are just an hour or
so behind. Also these mirror search engines can fork off if they want.
So how do we make this work? Here is my take but there can be other
ways. Once a user submits a URL and associated metadata and/or full
text index; a couple things need to happen. First we need to verify
that the url they are submitting isn't already in our database. If it
isn't then we check to make sure their submission is correct (we need to
crawl the url ourselves and double check) and then we can let their
submission stand (and generate them a shortened url for that url, we
will get back to this). Or if the url is in the database already then
we need to see if the current entry is exactly correct. If it is not
exactly correct (which means the site has been updated) then we need to
verify that the new submission is exactly correct then we can let it
stand.
Once this is done a new shortened URL is given back to the person who
just now submitted it (which means the original url contributor is now
cut off from the future of the entry...or their submitted data is tied
to a historical snapshot of the site) as well as a passkey to access the
digital trophies that the shortened url generates. All traffic
generated by the search engine that pertains to that url will go through
the shortened url. This means we can track how much traffic is
generated. For every so many hits then you can award that person so
many trophies. For example every 1000 hits they get 1 Glow trophy OR
for 10 hits they get a gleam trophy. For example they could choose
whether to cash out 30 hits for 3 Gleam trophies or keep saving up for a
Glow trophy. You could encourage people to not cash out until they get
a Glow by saying if you get some lesser number of hits say 750 or 900
they can get a Glow trophy. Yet they could convert thier glow trophy
for 100 gleam trophies effectively helping them reap a greater profit.
To claim the trophies they login with their shortened url and passkey
and can see how many trophies they have earned and can claim them by
viewing them. Preferably they would have their own spreadsheet of all
their shortened url's and passkeys. When they login to any one of their
shortened url's they can click "view trophies" the trophies are created
but you can only view the codes temporarily. Once they view their
trophies the codes quickly disappear from sight so the person must copy
them and paste them in a document like a spreadsheet that they can save.
You could request that the password (passkey) of the shortened url be
changed (and emailed OR viewed instantly) if you know your current
passkey.
A trophy is simply a key code. It doesn't have an account or
necessarily a person it belongs to. It is just a trophy. Someone could
perform a random key change for their digital trophy's code if they
don't like the look of their current trophy code (or batch of trophies)
if they want and have the new one emailed to them OR viewed instantly.
Alternatively you could make your reward just be a form of forum
currency for people to swap that they are not allowed to do any RMT
(real money trading) with. Or you could make a cryptocurrency like
bitcoin. But personally I just like the simplicity and elegance of a
digital trophy, it just works.
Now that could work. The only problem is that it seems there is a lot
of double checking that needs to happen. While this is leveraging the
botting power of the internet and our double checking effort should be
insignificant compared to the enormous difficulty of finding new URL's,
it still requires a lot of work. Perhaps we could leverage millions of
dollars worth of botting power to be as effective as trillions of
dollars worth but still we don't have millions of dollars. What we
could do is just allow people to have free edit over the database. I
don't want to make wikipedia's mistake by allowing anyone to delete
stuff, so I think it should be 'add only'; each person "owns" the url
they added to the database and only they can edit the associated
metadata/full text index. They would get trophies for the traffic their
url generates so it is in their best interest to keep the entry updated
so they generate as much traffic as possible. Alternatively we could
go full wiki and allow people to only add url's and not delete them but
allow anyone to change the associated metadata/index. Now it would be
hard to give trophies for anything in this scenario because if the
person who submitted the url (analogous to the person who created a wiki
page on Dwight Schrute) is given trophies for how many hits they get,
it would be in competitors best interest to saboutage the metadata/index
for that url so they can reduce the amount of trophies the url
submitter is earning. But it could still work by allowing people to
submit entries for free and get rid of trophies like wikipedia does.
Alternatively we could stick with the method of only the person who
submits the url can change the data associated with that url. Then we
could allow people to petition to claim a url someone else already
claimed by proving they actually own the url themself. Also they could
petition to claim the url by proving the current data in the database
associated with the url is innacurate or significantly outdated for more
than a week or so. These two scenarios are pretty much the same, the
person could prove they own the url by changing the page's metadata and
showing the current entry is outdated. They could document that at a
certain datetime the metadata of a page is different than the current
search result in the search engine. If the true owner of the url is
booted by letting their database entry become outdated, they can reclaim
the url again by proving they own it again. This will keep people on
their toes and continue to update the database as the site is updated.
The key will be for people who don't actually own the domain but own the
current entry in the search engine; to cash out their trophies as soon
as possible so that if the true owner claims it they have already taken
out all the trophies. Otherwise the new owner will be able to gain all
the trophies the previous owner earned. It can be kind of like a game
to raid and gain trophies. It is gamification of a search engine.
In any of those cases when someone else proves that the current owner of
the url entry isn't keeping it accurate, then the shortened url
associated with the url would be changed - and the passkey to access the
shortened url's unclaimed and future trophies would also be changed and
given to the new person who is now the new owner and sole editor of
that entry.
I think that latter case is the best case for us right now. It is a
little sad that only one person is allowed to update an entry; but
unlike wikipedia where articles can be miles long; this is just metadata
and/or a full text index which are more objective and verifiable facts.
Stay tuned for more discussion on this topic and find the "follow by
email" box on this page to get an email whenever I post new content.
Thanks!
-NatureHacker
The assignment problem is a fundamental combinatorial optimization problem where the objective is to assign a number of resources to an equal number of activities so get the best assignment writing services from Australia by experts at affordable prices. Get the best English Assignment Help and services by professional assignment writers of Australia.
ReplyDeleteBuy Baby Products Online in india at best price from totscart Upto 50% OFF. Choose a best baby products store online baby clothes, footwear, travel accessories, feeding and nursing products, baby furniture, and bedding, etc
ReplyDelete