Yet Another set of Ideas and Observations About Computer Hardware and Software, Episode 1

No Comments

There are very many aspects, how to evaluate hardware. Computer as such is just a mathematical model and probably the central idea behind the theory of computation is that if computer C1 can run a simulator of computer C2 then it can run all the programs that the computer C2 can run in the simulator of the C2 and therefore is at least as capable as the C2 and if the computer C2 can run a simulator of the computer C1 then the C1 is is able to run all of the software that the C2 is capable of running and is therefore at least as capable as the C2, which makes the C1 and the C2 equally capable. The rest of the theory of computation seems to be about the construction of the simulators and how to construct as simplistic simulator as possible. However, in practice there exists an axis called time, which means that the metal gears of the Babbage's Computer, if there are enough of them, may, in theory, be able run Linux/Windows/YourFavoriteModernOperatingSystem, but in practice, due to the time constraint the capabilities of software are limited by the "speed" and reliability of hardware.

Another aspect is the availability of hardware components, which includes price. The more expensive the component, the less available it is and at some point, like during the Cold War between the United States and Soviet Union, some components were not available at all, because the technology for them has not yet been invented. As one of the legendary lecturers at the University of Tartu, Ando Ots, once said at his electronics lecture: each generation has its own toys. (His comparison was that when he was young, he played with schematics that were assembled from single transistors, but our generation plays with electronic components that are whole integrated circuits. That was in 2002 or 2003. In 2015 electronics hobbyists use Raspberry_Pi-s and Arduino-s in stead of individual circuits and the individual circuits are considered to "lack features" by the 2015 standards.) From the perspective of a freelancer, even from a perspective of a corporate developer, who is backed up by deep pockets of the corporation, the price of the components is totally outside of the control of the developer and even if the price is acceptable, the availability of the components is still totally outside o the control of the developer. Therefore for hardware the very same separation-layer design pattern applies, where project specific code uses as many of its dependencies as possible through a project specific separation-layer and the separation-layer then connects to the dependencies, which can be replaced without changing any of the project specific code that uses the separation-layer. A sub-section of the bitrary is titled, "How to Choose from Crap".)

Analyzability. According to one theory the reason, why spy-agencies fail to predict the future, for example, why the American spooks failed to detect the imminent collapse of the Soviet Union, is that the analysts of the spy-agencies have an incomplete set of questions that they try to find answers for and the set of questions is incomplete, because all people, including the analysts, have some cultural background that they use for forming the set of questions. The same idea holds for questions that verification tries to find answers for. Verification and formal methods only tell, whether the type of flaw that the developer is looking for, is present or missing. It's the testing "in the wild", where the "rubber meets the road", that reveals some of the questions that were missing from the set of questions that were used at the verification step. However, if the verification step is missing, the same mistakes are repeated over and over again. On the other hand, the task of verification, the task of any kind of automated search for flaws, might depend exponentially from the project size, which leads to the requirement that the scrutinized(to avoid choosing between verification and testing) project must be small "enough" for the available computing equipment to carry out the scrutinization within "acceptable" time limits. What regards to the design pattern, where a big problem is divided to a set of smaller problems and then those are "small enough" to handle, then all physical properties might be known about the stone brick, but those do not reveal the different issues that the different kind of buildings created from those bricks will have. Formally said, knowing everything about the components does not make integration tests useless, because the moment the well studied components are put together, it's a walk in the unknown. An example from biology is that the Periodic Table might be well known, but the biologists can still have difficulties constructing a single insect that eats, reproduces and flies. Yet, automated scrutinization is paramount.

The Interest of Patent Trolls. Due to the patent troll activity the SD-cards, generally just called as "memory cards", should not be used and the USB flash drives should be used instead. The goal of the MP3 file format project was NOT to advance the technological capabilities of the consumer electronics industry, but to blackmail and bully everyone. In stead of using the MP3 file format, which was a tool specially designed for bullying the masses, the OGG Vorbis or Opus file format should be used instead.

Energy consumption and computational power signature. The electricity bill is a weird, time dependent, phenomena, because most of the sunlight does not even hit the planet Earth and there is no shortage of energy at our solar system. Even that little percentage that does hit the Earth, keeps the storms roaring, plants growing, the animals running and jumping. Nonetheless the heat dissipation requirement keeps the energy consumption issue relevant, because the smaller the CPU, the smaller the volume of matter that produces the heat and the general requirement is that computers should not melt their vital parts. The heat dissipation requirement stays even after the invention of small, almost-ever-lasting, nuclear batteries. The amount of heat generated depends on the amount of operations that the CPU has to carry out and that in turn depends on the efficiency of software. As of 2015_12 I predict that at some point heavily optimized programming languages like C++ will be used for saving electricity of some very small, insect-like, device that has over 1GiB of RAM, which might be some persistent form of RAM, may be FRAM or some other, newer, form of memory. The development design pattern might be that a build system that uses the compiler to keep track of programming language specific inter-file dependencies and asks the operating system for the list of files that were changed after certain timestamp, rebuilds only the parts that need to be rebuilt by using the GCC style option "-O0". In 2015_12 the compromise versions for C++ seem to be boost.build and CMake. What regards to the computational power signature of a computer system, then that can be illustrated by comparing the LAN based computing clusters with OpenAAC based GPU computing, Parallella Epiphany chips, FPGA-cards, plain-CPU-multithreading. Each of them have their own bottlenecks and limitations that are too exhaustive to be covered at this blog post. C++ style software efficiency requirement can also arise from the need to use more radiation tolerant components, which might use bigger and therefore slower on-chip components (transistors, capacitors, etc.) than the conventional electronics can use. Chip-integrated error correction blocks can also limit the amount of components that are available for implementing speed-optimization features like CPU-caches, vector processing units, etc.

As of 2015_12 I believe that an ideal workstation and an ideal server consists of a set of different computers that reside on a fast LAN and form a compute cluster, where most of the computers can boot really fast or at least wake up from power-saving mode in at most few seconds. After wake-up they would complete their task, stay on for about 10 more seconds and then go back to sleep. In case of servers there would be one, depending on the load, very low-powered, computer that services the inbound requests and as the amount of requests or jobs grows, the low-powered computer can switch on a few other Raspberry_Pi-like computers, which use "work stealing" by snatching jobs from the jobs queue and if the jobs queue still stays "too full" for "too long", some "big machine" might also be woken up to help out. The "big machine" would snatch jobs from the general jobs queue the same way the Raspberry_Pi-like computers do. The good news is that the Raspberry_Pi-s take so little power that the Raspberry_Pi-s can be powered by using some ordinary motor driver. The motor driver can be operated by some other Raspberry_Pi-like computer by using the Raspberry_Pi-s IO-pins or pin analogues.

Social processes. It is a fact of life that different people have different psychological properties, different agendas. Often times the very different people happen to share a profession and IT is no exception. Many people are in IT because they had the impression that it is something that pays well or is prestigious. Usually the people, who are not in IT for pure passion for solving technical problems, do a really shoddy job, PUBLISH CRAP, and some people, who do prefer to spend their time solving technical problems, might be temporarily, accidentally, at a shoddy team, where it is not possible to do a good job. (I was at such situation once and I resigned.) Sometimes a good software developer has to clean up the mess of others, which might have been created not because the other developers were shoddy or bad, but because the time given to the other developers for doing the job was inadequate and the client eventually realized that it actually takes time and effort to do the job satisfactorily and the good developers, who were put to that shoddy situation earlier, had resigned with the knowledge that their client was just plain immature. Whatever the case, in a world, where the reliability of the end product depends on the reliability of its components, shoddy, broken, components must not be used. To lessen the degradation of well crafted code, different people must be kept from modifying each-others' code. Versions of software that do not show any symptoms of any flaws must be considered to be without backwards compatibility even, if backwards compatibility is advertised. As of 2015_12 I see the solution to be the Babel Architecture, which also allows multiple versions of the same library to be run in a single application, at different processes. What regards to versioning, then components MUST NOT be referenced by using the version componentname_newest, but all components must be referenced by a specific version, like componentname_SpecivicVersionFoo. In an ideal case the requirement to use a specific versions in stead of the version newest applies also to all tools that are used for building the applications. That is to say, development tools must be considered as part of the set of dependencies. It is OK for the development tools to be far less portable than deployment deliverables. In an ideal world the development deliverables include a virtual machine image with all development tools, dependencies, source code and documentation installed. In a not-so-ideal world the virtual machine image checks from its version specific URL, whether there are any newer versions of the source code available that have been tested to be "develop-able" with the given virtual machine image. The virtual machine specific URL might be implemented by sym-linking specific versions of source packages from the general list of source packages to the virtual machine image specific URL folders.

Multithreading. I have written a whole chapter on that, but the summary is that there exist operating system threads (processes and the ones created with OpenMP and alike), programming language implementation level threads (like the Go green threads), and application level threads. An example of an application level thread is a web software session. A critical section of an application level thread can be illustrated by a situation, where the same user has logged in simultaneously from multiple computers and has the account settings screen open at both sessions. As the GUI-s of both of the sessions should be synchronized, a change at the account settings through one of the sessions should trigger a GUI update at both of the sessions. If mutex-less threads are available, then a mutex or a lock can be implemented by dedicating one thread to the implementation of a mutex instance.

Web applications. In an ideal case there are multiple servers in some anonymization network (Tor, GNUnet) and the servers can be out of sync, go off-line, come back on-line, at any moment. The client picks any of them in random, preferably writing to more than one server at every write cycle. If the client has a shoddy, "slow", internet connection, then it can upload data to one of the servers and tell the others to pull the data from the server that received the data. That way the other severs can get the data through fast(er) connections between the servers. Most web applications should not implement the user access rights server and its GUI themselves, but use encrypted connections to an authentication server that has some general resource access model, which might be BARS. Due to the fact that all public key encryption algorithms are fundamentally broken, only symmetric key encryption is used. Public key encryption algorithms can be used as symmetric key encryption algorithms, if the key-pair is used as a single private key. Sessions as application level threads should be considered with at the application architecture from the very start. To allow messaging between web applications that run at different tabs of a common browser instance, relative addressing should be used in stead of absolute addressing. To allow massively online games and chat applications to avoid the constant polling of a database instance, the traditional database access model, where web software (PHP, Python, Ruby, Java, etc.) polls the database instance with queries, is replaced with the RethinkDB-style design pattern implementation, where the web software subscribes to the database engine instance by handing it a query and whenever data is written to or modified at the database instance, the database instance runs all of the queries of its subscribers and sends the query results only to those subscribers that have a non-empty query result. If the speed of queries is the ultimate goal, then in stead of using a database engine there should be a C++ servlet that loads all of its data at startup and writes the data at a background thread and at shut-down. For the rest of the cases the SQLite3 or Titan Graph Database will probably do. "Native JavaScript" code must always take to account the fact that different CPU-s can have different bit endianness and different byte endianness.




The list is incomplete and can be modified at any time.

Thank You for reading this article. :-)

Comments are closed for this post