I just woke up(at the time of starting to write this blog post, not when the draft was published) and read a LinkedIn article that described how in addition to privacy violations the Google has gone just too far with its paid adverts by making the adverts look almost like the rest of the search results. I've been noticing myself also accidentally stumbling on those links, mixing them with the rest of the links, although, I'm free of that trouble for the most cases, because I use the Startpage search engine filter.
Well, I do not think that it's evil to sell adverts and if selling adverts gives us all that open source software and technology development that Google has given us, then clearly there is a legitimate case for selling adverts. However, I do think that there is a huge difference, HOW it is done and this is, where the "early Google" got it right and the current Google" deserves to loose clients/revenue. So, long story short, the business model that I propose, would love to implement myself even, if someone just paid for my time and the hosting equipment, is that it's a "facebook for businesses", practically competes with the LinkedIn, except that unlike LinkedIn, there is no mandatory log-in, unlike LinkedIn, there are no artificial limits on sending messages/making contact with other parties, but the main emphasis is on the search engine part and nobody can pay to get a higher rank. Creation of a company account could be free to everybody, insertion of basic data to the account is also free, but the monetization would come from paid hosting of Videos, PDF-s, product downloads, etc. The pricing would be based on stored and transmitted data volume, id est it would be correlated to the actual cost of the service, and the main sales argument would be a guarantee that if the company web-site is offline, under a cyber-attack, then the set of marketing materials that are displayed from the account of the search engine are still ONLINE and with greater reliability than the 2015_09 YouTube.
Different regions want to censor different services. Google had trouble with the Chinese communist party, some porn sites will probably have difficulties with the various Arab sultans/kings/heads_of_religion, WikiLeaks and Snowden and Manning are classics, not to mention the shadow economy in Soviet era Estonia, where "capitalist activity" was outlawed, yet essential for at least somewhat decent living. Therefore, the monetization is possible only partially, id est freedom of speech and democracy parts must be censored from Chinese clients, etc. but to keep the system running, its infrastructure must allow different "search engine vendors" to connect their services to a single, distributed, system, a lot like different computer owners can/were_able to connect their computers to form the Internet. The idea is that just like there are internet standards like the IPv4, IPv6, HTML, http, etc. there might be some standard to connect different search engines.
The Good news
After writing the last sentence of the previous section I startpage-d search engine "standards", API-s, just to be sure that I do not post such nonsense here that is "too obviously" visible as nonsense. I stumbled upon an Apache Licensed C++ project called Gigablast that is an open source search engine that has been in commercial development since 2000 and is still run and developed by its original author, Matt Wells. Its source code is about 135MiB, when zipped, but the 2015_09_29 version
does compile on Debian. However, I wasn't able to run it due to some initialization failure. (Later correction: the issue was that the Windows imposed installation path length limit was not switched off for Linux, so it run really well on Linux, if installed to a shorter path, however Raspberry PI based installations will probably require some customization, because it consumed about 1.5GiB of RAM.)
I have tried, even hosted, YaCy before, but that was so broken that there's no point of working on it. The YaCy had tons of features, yet it was unstable, crashed, sometimes hung. So, nice idea, way to go, but the implementation is hopeless, the unstability makes it impossible to use the software and it seems to me that it's always smart to stay away from projects that are developed by people, who do not make feature X stable before moving on to the implementation of some next feature. What regards to the Freenet Project, then the Freenet is a very influential "grand daddy" of P2P-Internet, but the probability of getting a brand new concept right the very first time is slim at best and the Freenet developers should just make a summary of lessons learned, write a brand new implementation from scratch and DUMP the old Freenet implementation. Once upon a time I tried to build the Freenet software and noticed
that the Freenet software has been piled together like some quick proof-of-concept, noble, but lacks essential implementation quality.
I guess that both, the YaCy, and the Freenet projects tried to solve too many tasks at once. At the inception of the Freenet project there were no Tor Project and no GNUnet, so they tried to compensate that by other means. In 2015 I think that a search engine can just focus on search and forget about offering P2P connections, a distributed, censorship free storage, can just focus on storing and listing encrypted blobs and not worry about any gang of mafioso(explanation) finding the location of the servers by IP-address, etc. So, the old, UNIX-style, tried and still working, COMPONENTIZATION is the key-word of my choice here. Different, specialized, components just have to have some way of interfacing with other components and that can be facilitated by using the adapter design pattern.
Sometimes I wonder, if all business models depend on some sort of mass effect. For example, out renting out a single apartment is dependent on the fact that someone, either the owner or renter lives/uses the apartment exhaustively enough, long enough, to cover the huge real-estate costs. The renter just has the opportunity to pick a small piece of the huge lot, a small time-slice of the long time-line. The same with loans, where a lot of people pile their small sums to a big pile that is rented out for an interest rate. The outrageous cost of software development and hardware development, not to mention materials science and chemistry and physics, is cut to smaller pieces and distributed to the large quantities of consumers, who buy the electronic devices, radios, clocks, phones, door bells, laptops. May be the presence of some mass effect might be one of the fundamental laws for assembling business models, a bit like, a fundamental law of carrying out a proper medieval execution service is that after the executioner, lawyers, heads of church, kings, whoever else, have done their thing, regardless of religion, version of bible/koran/tallmudi/constitution, the victim dies.
May be the second fundamental law of business models might be that at the end of a "day", there must be some profit and no debt. The "currency" of the profit might vary, it could be gold, dollars, euros, reputation, opportunities for sex, affection, potatoes, bananas, silk, drugs, weapons, medals of honor or some combination of various "currencies", but there must be some "aimed" profit. A third fundamental law of business models might be that there must be some mechanism for avoiding stealing, robbing. I guess that the best mechanism for guarding object X against stealing/robbing is to make object X inherently useless at the hands of the thief/robber.
So, coming back to the business model of the search engine business, the way to blackmail payment from companies, who depend on the search engine at advertisement is to make the companies host their own corner of the search engine, a bit like the closed source Space Monkey project does. For example, a central judge verifies that no pod lies about their search ranks and an advert would appear from the pod/machine that hosts the index for the first highest ranking non-advertisement links. The more the company hosts general indices, the more it invests in the computational power of its pod, the more advertisement opportunities it has. The judge software, an aggregator at some specific domain that a person uses as the front page of the network of pods, makes sure that the rankings that the pods report, match with the content. There can be multiple, different, judges, front pages of the pod network and the Chinese government can have its version, which is not a bad thing, because just like it was with the blocking of GitHub, if the Chinese version depends on the same traffic that free versions depend on, they can not block that traffic. On the other hand, no Google/Bing clusters, nor any Chinese/NSA Big Government facility, can offer as much computational power and storage space as a distributed network can, which eliminates the idea of "state sponsored special search engine" by using economic means.
If I were Designing a Search Engine
The frame problem, which essentially is based on the idea that meaning depends on a context and context depends on data for detecting the context and that data is not available to any computer unless they run an operating system process that gathers or uses the data like humans do, effectively making the killing of that operating system process equivalent to a murder or violent sleep-induction of an intelligent being. So, that explains, why the Google Translate does not have a chance to offer a proper translation as long as it uses statistical methods. Intuitively that also explains, why different people translate the same text from one language to another differently, specially, if they have different specialities. The same applies to search engines at scraping and interpreting end user queries.
My proposed architecture would consist of different components. First, one would think of some model for documents. Each document that a spider finds from the determined region of the Wild-Wild-Web would be translated to a record that complies with the document model. The records would be stored to some clustering database, may be Titan, Neo4j or RethinkDB. There is absolutely no crypto or any security measures, no log-ins, anything, between the computers that form a search engine cluster. The cluster would reside at a LAN that has multiple gateways that handle the security aspects of things, but within the cluster, almost everything is optimized by run-time speed and all of the security related reliability aspects are omitted. The queries from the end users would be translated by the gateways to take security measures to account and to work with the model that the documents in the clustered database are stored with. Strategy wise I would start the implementation in very modular Ruby and then gradually replace the hot-spots with a C++ implementation.
Oh well, I still haven't even completed the first stage of my Silktorrent project,
so, I'm just dreaming and comparing different ideas here.
Thank You for reading this post. :)
P.S. Some links to some open source search engine related materials and engines.