Application Software Encapsulation Idea X1 and a Doomsday Computer Architecture Idea D1

No Comments



The background music for this article originates from a movie titled "DREDD".




(The image can be seen in greater detail by clicking on it.)


The schematics assumes that if an agent, in this case, each of the cloned components (the web server, the web application, the web browser), has no ability to permanently save anything anywhere, including outside of the computer that runs it, and the agent is cloned at the start of the session and killed at the end of the session, then the malicious client has no possibility to leak anything anywhere. The agent still has the ability to ruin the data at the trusted hardware and the agent, for example, the web browser, still has the ability to display things wrongly on the screen. The scheme does not have any counter-measures against the physical bugging of the screen/monitor, keyboard, mouse. Even if it did, the room, car, area, where the computer is used, might be bugged. The most secure human-machine-interface is probably the microphone and headphone jack that have been integrated to the secure, application specific, hardware. The malware in the MODEM computer makes Tor useless.

A rule of nature is that the party, who can inflict its will on others, has its way. The choice, what to inflict on others, can be stupid and lead to various losses to the inflicting party, but none the less the choice is for the inflicting party to make. As the saying goes: even the God and all other strongly positioned dictators bear responsibility, because they have to answer to themselves in terms that they are not ruining things for themselves. Requiring that all computers are equipped with super-mafia, generally called as government, malware can be stupid, but it is still possible.




Countermeasures



Even if perfect hardware were available, no malware, no hardware bugs, no ability to install malware, no ability to place hardware bugs, then the rest of the systems around the perfect hardware can still be compromised. There can be drones looking over the shoulder, sand-grain sized bugs on clothing, etc. Data leak countermeasures are amiable, but whenever possible, they should not be relied on. As always, a graph of failures should be constructed. (Unlike the attack trees, the graph of failures can contain loops and events that are not caused by adversarial activity.) In the case of hardware the only harm that malware and hardware bugs can do and that I'm aware of on 2016_06_07 is slowing down, optionally to the point of halting, or destroying the hardware, leaking data, overwriting existing data, deleting existing data, flooding with generated data. With the exception of hardware slow-down or destruction, the malicious activity of the malware and hardware bugs depends on getting access to memory regions, where existing data resides, or, in the case of flooding, regions that will be read later. Memory "leaks" in the C++ context classify as hardware slow-down, because if the memory runs out, the execution of the application slows down to a halt. A back door in CPU chip means that the operating system process that runs the malware gets access to some RAM region of some other operating system process without that other process explicitly giving permission for that access. If the data of the victim process is saved, may be to some shadow region of the RAM or shadow registers in the CPU, then the victim process and the malware process do not have to run concurrently.

An amiable, but least reliable countermeasure is to construct formally verified, secure, oprating systems (Genode OS, Minix 3). An amiable, but harder to achieve countermeasure is to construct hardware that can not be tampered with or where tampering, including the placement of hardware bugs, can be detected and the tampering can be reversed. (Hardware costs. Super-mafias, the governments, have almost limitless amount of resources, whilst dissidents have hardly any monetary resources, because if the taxation percentage is over 50%, the super-mafia will always have more monetary resources, even if the dissidents were financed by some successful business. When it comes to taxation, dictators and western democracies are allies and the "InterPol" works flawlessly.). One of the reasons, why the creation of 100% custom hardware is probably the most difficult to achieve tasks even in a situation, where one owns his/her own semiconductor foundry is that the computers, where the custom microchips are designed, can be compromised, leading to a situation, where a totally NON-compromised semiconductor plant produces compromised microchips. From that point of view, may be the generic FPGA-chips that everybody uses, including the super-mafiosi themselves, might be the safest solution, specially if the closed source FPGA is loaded with a design that implements an open source FPGA, as it has been done in the form of the Flavia project (archival copy).

Given the relatively small "virtual gate count" of FPGA-s, the open source FPGA has to be implemented on a system that consists of multiple closed source FPGA-s. That opens up a possibility to run the open source FPGA on hardware that uses FPGA chips from multiple, different, vendors. If speed is not that critical, some "very slow corners" of the open source FPGA might run on hardware that is not even an FPGA chip, but a microcontroller that runs software that simulates a very tiny and terribly slow FPGA at some of its universal IO-pins. The idea of using a microcontroller for running a slow-as-hell simulation of some more advanced system is not new. The idea behind the "Linux on an AVR" (archival copy, author: Dmitry Grinberg) is that an 8 bit microcontroller runs a simulation of a 32bit CPU and in that simulation the simulated CPU boots Linux that has been compiled for that, simulated, 32 bit CPU. If FPGA-s get banned by law or go off market due to the fact that engineers, the people, who have brains to care, are hopeless loosers from business point of view, the open source FPGA might be simulated by using a huge amount of general purpose CPU-s. As the simulated FPGA is going to be really slow, the operating system that runs on the CPU that is loaded into the simulated FPGA has to be very efficient, essentially inspired by the work of the Niklaus Wirth and the hardware optimization ideas (archival copy) that were developed by Russian computer scientists during the Cold War. As of 2016_06_06 my understanding of the main hardware optimization idea that was developed by Андрей Николаевич Терехов is that many hardware level checks can be eliminated by running only binaries that are generated by a compiler that guarantees by using formal methods that the error conditions never happen and that CPU-s with lower clock speeds can outperform CPU-s with higher clock speeds by using more parallelism.. One of the hardware level parallelism option is the use of Systolic Processors.

Probably the way to utilize the vast amount of terribly slow CPU-s is to use the ParaSail programming language. If I remember correctly, then the 2016 implementation of the ParaSail programming language includes a ParaSail to C translator. The translation result, the C code, might be formally analyzed (may be with mbeddr).




If Malicious Chips have a lot of Secret Memory for Storing their Eavesdropping Data, RAM Region Access Based Countermeasures fail


For FPGA-s and CPLD-s the storage might contain a series of all "gate" changes. For RAM the secret storage can consist of dumps of the publicly visible RAM regions. For CPU-s the secret storage might contain all CPU-IO. Given the sizes of modern Flash and FRAM memories, those logs might be feasible for small, slow, simulated, computer systems.

Actually, I came to that conclusion after trying to figure out, if it is possible to construct a memory region separation based secure system from back-doored CPU-s and other components on conditions that none of the components has "sufficiently" large secret memory, there are no Bluetooth connections, chips of different vendors do not cooperate on eavesdropping and the function of the pins does not change during the time, when the chips are supposed to serve the public owner of the hardware. As of 2016_06_08 I think that I failed to construct a secure system from malicious hardware, but the system that I constructed is described at the next section and if the hardware components are not malicious, the system has some nice scalability properties.




The Doomsday Computer Architecture Idea D1




Political Introduction


First of all, if You are wondering, why a hell am I spending my time on such a stupid project in stead of focusing on the myriads of other problems that I have, completing, cleaning up, my software projects, etc. then the answer is that I just can not stand the idea that in 21. century, where everything depends on Information Technology, computer hardware and software, the world is literally left to a "few esoteric fools" to save and I should just ditch all my dreams, all my desires, because some people feel that they are better off if they just destroy my opportunities and the opportunities of many other small and independently thinking parties as an "irrelevant side effect". That's the reason, why I fought against Microsoft's efforts to eliminate freelance software developers and small software development companies from the market. The same now. It is important to me that I am able to provide my software development services even, when the hardware market SCREWS UP, like it already has in some ways. Now, You may say that the Raspberry Pi is cheap and its competitors make the life even greater, but that's only great in the limits that the super-mafiosi allow it to be. I may be poor, but I am certainly not save-minded enough to believe that what the "masters" tell is something that I should genuinely accept as fair, so that's NOT GOOD ENOUGH FOR ME, except that in stead of just waving with banners I try to actually enforce myself by making it technically sure that my interests are protected. For example, my answer to the Edward Snowden revelations is my own, proper, e-mail encryption software, mmmv_crypt_t1, that is optimized for encryption strength, not execution speed or usability to non-technical users. For years I've been also working on Silktorrent, with a presumption that there have been many similar projects before and they have all failed due to poor modularity, leading to a situation, where outdated components, for example, crypto-algorithms, communication channels, implementation programming languages, can not be updated, replaced, swapped.

The Doomsday Computer Architecture D1 gives me, a software developer, a back-up plan, when the market for traditional computer hardware gets ruined. Software is useless without hardware that is capable of running the software. (At first I simplify the task a lot by distinguishing the hardware that is used for developing the software and the hardware that is used for running the software at the end users' environment.) If You are wondering, why haven't I put the Doomsday Computer Architecture D1 to a separate blog post, then my answer is that the preceding text is a necessity for understanding the choices made in the Doomsday Computer Architecture D1. If You are still wondering, why am I working on such a project NOW (2016_06_08), when I'm in a hurry with many things, then my answer is that all software architecture related decisions that can not be changed later, must be very carefully thought out and the ideas behind the Doomsday Computer Architecture D1 are actually more general than just creating computers from "scrap parts". For example, in applications software there exist application level threads that are not the same as operating system threads or programming language concurrency based threads. An example of such an application level thread is a log-in session of a web application. Application level threads have many, if not all, the very same issues that operating system level threads have: race conditions, critical sections, lock-ups. It's nice to have an application software architecture, where the application software does not need to be re-written to make it secure (to the point that the hardware and operating system that runs it, allows it to be secure) and robust(some servers go offline and neither the network administrators, nor the original software developers need to know about it), without rewriting any part of the application software. Hardware architecture and operating system architecture give a lot of inspiration for that kind of application software architecture and I do not want to be the fool, who is leaving the fine opportunity to be unused.




The Architecture





The image can be seen in greater detail by clicking on it.
The pink text refers to the fact that the architecture failed to deliver a secure system if the components are "powerful enough" to record the activity of the hardware.


The idea is that in stead of consisting of only FPGA chips, the system contains a lot of different CPU chips from different vendors and each CPU chip forms a CPU-module that can be plugged to any of the CPU-module sockets, regardless of the CPU-vendor. The massive amount of data transfer makes the system slow and energy hungry. The system has multiple single-point-of-failures, but the nice thing about the architecture is that it can be cobbled together from parts that are available.

The compiler divides the computation to small chunks, computation microtasks, CmuTASKS, that absolutely every CPU in the system is capable of executing with sufficiently high speed, but in stead of using single CPU specific assembler commands, the compiler virtual machine assembler commands are translated to a set of CPU-specific assembler commands and those are compiled to CPU-specific binaries for every supported CPU that might exist in the system, so that each CmuTASK contains binaries for every supported CPU-type, regardless of whether the CPU-s are really present in the given system. The lengths of the binaries for different CPU types may differ. For security reasons the CPU-modules that execute the CmuTASKS are chosen during run-time, real-time, without saving the CPU-module selection choices anywhere. The CPU-modules are chosen by a FPGA/CPLD based custom module that uses its own, personal, on-board thermal noise based random number generator. That's the FPGA/CPLD at the bottom of the hasty draft and I call it computation microtask shuffler (CmuSHUFFLER).

The system uses virtual memory and the FPGA/CPLD at the top of the hasty draft, hereafter RAM-CONTROLLER, implements it. The RAM-CONTROLLER also has its own, personal, random number generator, which is used for generating a random bitstream for one-time-pad-like Vernam cipher. If the RAM-CONTROLLER had as much internal memory as is the RAM size, then it could use the one-time-pad or, better yet, skip wasting time and electrical power on copying the data from itself to the RAM module and store the original data on its own crystal, but since it does not have that much on-crystal-memory, it has to use the RAM module and reuse the generated bit-stream, making the real-time RAM content encryption theoretically breakable. The memory separation is based on CmuTASK ID-s, not CPU-module-ID-s. The CmuSHUFFLER tells the RAM-CONTROLLER every time, which CPU-module received which CmuTASK and when the CPU-module that executes CmuTASK TASK_x1 saves any data to RAM, it has to tell the RAM-CONTROLLER, which are the ID-s of other CmuTASKs, that are allowed to read/write/delete to that allocated memory region. The RAM-CONTROLLER can enforce such memory access scheme, because the CmuSHUFFLER always tells it, which CPU-module received which CmuTASK. With the exception of kernel boot code, for security reasons new CmuTASKs are created only by asking the RAM-CONTROLLER to allocate memory for it. CPU-modules do not have write access to CmuTASK-ID fields. CmuTASK-ID-s are positive whole numbers that are generated by the RAM-CONTROLLER. CmuTASKs are variable sized records that contain binaries for all supported CPU types, a CmuTASK-ID field, user-added data fields. CmuTASKs as records have a system dependent maximum size limit.

To eliminate a situation, where one CPU-module tells the RAM-CONTROLLER that it is some other CPU-module, the CPU-module-ID-s are based on CPU-module sockets, not CPU-modules themselves. To make it harder for malicious chips to cooperate, the CmuSHUFFLER is not informed, what CPU-s run the CPU-modules. Each CPU-module receives a CmuTASK that contains assembler for all CPU types. Each CPU-module is furnished with a cheap CPLD or FPGA that picks the CPU-specific assembler commands from the CmuTASK set of assember commands. One of the beneficial properties of this architecture is that CPU-modules can be added during program execution and removed whenever they have either just completed a CmuTASK or not yet received any CmuTASKs.

There are some constant CmuTASK ID-s. Those are assigned to IO-devices, RAM-CONTROLLER, CmuSHUFFLER. IO-devices are part of the general address space. The CmuSHUFFLER needs to access RAM to make it possible to generate executable code, generate new CmuTASKS that did not exist during program start-up. The different operating system access zones, kernel zone, user-land, jails, etc. can be implemented by using CmuTASK heritage attribute, CmuTASK-HERITAGE. The CmuTASK-HERITAGE is a path from the current CmuTASK to the root CmuTASK. Each CmuTASK A can read-write-delete all memory regions that are created by CmuTASK B, whenever the CmuTASK-HERITAGE value of the CmuTASK B has a prefix that exactly matches with the whole CmuTASK-HERITAGE value of the CmuTASK A. In additon to the default access scheme each memory region can have read/write/delte access attributes that list additional CmuTASK-HERITAGE values.

Operating system thread execution priorities are implemented by assigning execution probabilities to the CmuTASKs. The execution probabilities are calculated from CmuTASK-HERITAGE values. The killing of operating system threads is implemented by using a mechanism, where the CmuSHUFFLER has a FIFO based KILL-LIST that consists of CmuTASK-HERITAGE values. All CmuTASKS, whos CmuTASK-HERITAGE has a prefix that is in the KILL-LIST, are discarded, banned from execution. (Due to the fact that CmuTASK-ID-s are guaranteed to be unique and that CmuTASK-HERITAGE values are paths from a tree node to the root of the tree, a lot of memory for storing the KILL-LIST can be saved by saving only the CmuTASK-ID in stead of the whole CmuTASK-HERITAGE value. All CmuTASKS that have the KILL-LIST-ed CmuTASK-ID anywhere in their CmuTASK-HERITAGE, are banned from execution.)




(The image can be seen in greater detail by clicking on it.)


Idle threads and interrupts and events are implemented by having the RAM-CONTROLLER maintain a hashtable that consists of idle CmuTASKS. I call that hashtable ht_idle_CmuTASKS. The RAM-CONTROLLER has an implementation specific registration mechanism(that is based on the observer design pattern) where CmuTASKS can register themselves and other CmuTASKs that they have access to to be moved from the ht_idle_CmuTASKS to the data structure, from where the CmuSHUFFLER moves CmuTASKs to its execution queue. Hardware interrupts are events that IO-devices declare to the RAM-CONTROLLER. In addition to IO-devices, CmuTASKs can declare events and that's how operating system events can be implemented. Event priority is defined at the declaration of the event and it is based on the CmuTASK-HERITAGE of the CmuTASK that declares the event. The closer to the root CmuTASK, the declaring CmuTASK is, the higher the priority of the declared event will be.

The system uses deadlined cooperative scheduling, meaning that each CmuTASK must complete its execution within a time frame determined by the CmuSHUFFLER and if it fails to meet the deadline, all RAM-CONTROLLER API calls that involve writing (delcaration of events and event listener CmuTASKs, allocation/deletion/modification of memory records, declarations/modification/deletion of access records) are discarded/not-committed to globally accessible RAM. Locks of a CmuTASK that fails to meet its deadline are released at the deadline. Locks of destroyed CmuTASKs are released automatically. A CmuTASК is defined to be destroyed, if it does not ask itself to be placed to either ht_idle_CmuTASK or CmuSHUFFLER queue. It's a matter of configuration, whether CmuTASKs that fail to meet their deadline are placed back to the CmuSHUFFLER queue or destroyed.




Some Optimizations


Given that the RAM-CONTROLLER is a single point of failure from security point of view and from other types of robustness point of view, it might as well be in the same FPGA/CPLD with the CmuSHUFFLER. Those 2 are supposed to cooperate closely anyway, inform each other from their activities. That eliminates also the need for having 2 random number generator implementations in the system. It also eliminates the need for transferring CmuTASK data from the RAM-CONTROLLER chip to the CmuSHUFFLER chip. The CmuSHUFFLER queue can be the very same memory region that the RAM-CONTROLLER uses for storing the CmuTASKs that are supposed to be placed to the CmuSHUFFLER queue.

Given that the CPU-modules do what ever they want anyways, including eavesdrop on the CmuTASKs, there might as well be an official RAM-CONTROLLER API that allows the CPU-modules to cache CmuTASKS. That way time can be saved by skipping some data transfer and CPU-module specific decoding of the CmuTASKs. Reduction of data-transfer reduces power consumption, provided that the data transfer does not increase localized calculations by an amount, where the localized calculations consumes more power than the data transfer.




Initial Ideas for the Proof of Concept of the Doomsday Computer Architecture D1 and some Further Comments


I will probably edit this blog post later and this chapter will be replaced with some other chapter, but for now this blog post consists of only description of vaporware. I need some time for the ideas to settle, to sink in. My first thought was to cobble together a quick-and-dirty Ruby file that executes the CmuTASKs by calling eval. after all, the architecture as a whole does not look that difficult. I was hoping that that Ruby code would make a nice, succinct demo about the elegance of the Doomsday Computer Architecture D1, but when I started to think, how much code I have to write to implement the scheduler that uses the CmuTASK-HERITAGE values for calculating execution priorities, and then that probably at least 20 lines of boilerplate practically ruines the idea that I could create some elegantly succinct demo. The RAM-CONTROLLER memory access rights calculation code is not going to look nice either, further ruining the prospects of an elegantly succinct demo. So, if the demo is going to look ugly anyways, I might as well just go for the real thing with the very first shot. I know that I want to practice formally verified C, where the malloc is never used, id est the code should follow the style and requirements of a safety critical systems, the ones used for avionics software, MISRA-C, etc. May be the Doomsday Computer Architecture D1 might offer a fine set of restrictions and requirements for that kind of embedded C project.

Currently (2016_06_08) it seems that a real beauty of this architecture is that in situations, where security and computational performance are not the main concerns, the CPU-modules might be replaced with microcontrollers that are all manufactured by the same vendor. In that case the CmuTASKs can consist of binaries that are compiled only for one CPU type. May be the CmuTASKs might be encoded in EmbedVM bytecode regardless of the CPU variety. In some simpler case the CmuSHUFFLER and the RAM-CONTROLLER might be implemented by using microcontrollers. May be the whole system, CPU-modules, CmuSHUFFLER, RAM-CONTROLLER, might fit into multicore Parallax Propeller, which also has an open source HDL implementation, or some multicore XMOS chip.

One of the The technical benefits of the Doomsday Computer Architecture D1 is that most CPU-modules can be switched off, when the CmuSHUFFLER queue is "short enough" and its content flows through CPU-modules "fast enough". Another option is to use the Doomsday Computer Architecture D1 as an architecture for cluster software. The CPU-modules would be replaced with nodes on the network. The nodes do not even have to reside on a LAN. A single node might serve multiple different RAM-CONTROLLER-CmuSHUFFLER pairs, different virtual computers that run totally different Doomsday Computer Architecture D1 specific operating systems. On 2016_06_08 it seems to me that the Doomsday Computer Architecture D1 seems to fit well for implementing the Geth-s from the Mass Effect 3 game. One of the side effects of the Doomsday Computer Architecture D1 is that it reduces design risks by allowing the computational capacity of the system to be increased without redesigning any of the software. Even recompilation is not needed, not to mention that the whole system can be scaled without needing absolutely any reconfiguration anywhere and that's probably one of the wild dreams of many hosting companies. (Yes, I know that they have a new corporate nonsense buzz-word for network administration: dev-ops.)

Comments are closed for this post