Visually Defining Eventual Consistency in DHTs

December 8, 2014

DHTs based of Amazon’s Dynamo, such as Cassandra and Riak, claim to be “Eventually Consistent” DHTs. In the following post I attempt to visually represent exactly what that means to me.

I came across the blog post by Werner Vogels, where he explains the significance of N, W, and W variables in zero-hop DHT. Roughly speaking, N is the number of nodes data will be replicated to.W is simply the number of nodes that need to respond to a client when it tries to write data into DHT, whereas R is the number of nodes that need to respond when reading data from a DHT. The goal of this post is also to make more understandable to me exactly what N, R, and W variables are. Screen Shot 2014-12-08 at 12.38.48 PMFor a DHT with 8 nodes, labeled A – H, certain portion of the data is stored on each of the nodes. The node that stores the data is determined by the hash of the portion of the data. If the hash has a value range between 0 – 256, each node is responsible for an equal sub portion of the data corresponding to the hash space.

Additionally, DHTs allow for the data to be replicated to more than one node in case of failure. In the example, N is set to 3, so the data is replicated in three places.  So when W < N, how do the N – W nodes receive the data? According to Werner Vogels article, “a mechanism is in place that applies the updates in a lazy manner to the remaining nodes in the replica’s set”. The time it takes for the update to propagate through is the inconsistency window and is the period during which it is dangerous to read from the DHT, since you may get a different value from the one that was previously written.

However, Werner Vogel claims that this may only happen when W + R <= N. I wanted to understand better what this dynamic relationship between N, R, and W is, so with some help from my professor I drew a graph of what exactly happens. In order to do that, I took some liberties in order to simplify the example. I assumed that events, such as write to DHT, read from DHT, receive a response, etc., only happen at specific time ticks, where a clock tick has predefined time period. I also no two events are concurrent, which means that two nodes cannot receive a multicast message from a client at the same time.

If we rewrite the equation, then R <= N – W. So this means that the system is in danger only when the amount of reads necessary is less than or equal to the number of nodes that are being propagated to lazily. There are three possibilities, either R is less than W, R is equal to W, or R is greater than W. I will focus on the case when R is less than W, although all three cases work the same way.

Screen Shot 2014-12-09 at 1.39.22 PM

The following example shows how the graph looks when R < W. Time is represented on the horizontal line, and the vertical line represents the amount of nodes the request has propagated to. So, at a certain time tick period, tick 1,  client ta adds data to the DHT. At another time tick period, tick 2, W nodes have responded to ta’s request and ta thinks that the data is written into the DHT. However, it takes some time for the data to get out to all N nodes. We call the time for the data to propagate to all N nodes tick 3. Now let’s imagine that a different client, tb executes a get on the data item because it wants to know its content, which can happen either between tick 1 and tick 2, or between tick 2 and tick 3. If the get happens before tick 1 or after tick 3, it is the same as what it was before or after the add. The data is in danger of being different in the read than from the write only if the dispatch from the read starts and ends in the danger area. As an example, let’s say than we have N = 5, W = 4, and R = 1. According to the equation, it is possible to get an inconsistent result since R <= N – W. If it takes one clock tick for the request to propagate to each node, then we are in the danger area for 1 clock tick. At exactly this clock tick it is possible for a node to read from the single incorrect node and get the stale data. This example is illustrated in the graph below, where the purple area represents the time that it takes for the read to propagate to the incorrect node.

Screen Shot 2014-12-09 at 1.37.01 PM

However if the read happened before tick 2, then the read would be correct since the read finished before the write finished. However if we set N = 5, W = 4, and R = 2, then there is no possibility for the read to land in the danger area. If the read occurs between tick 2 and tick 3, then it could only finish after tick 3, at which point the client would get two conflicting results. Similarly if the read starts before tick 2, it could end after tick 2 and before tick 3, but then the results would conflict. Here is a different example.Screen Shot 2014-12-09 at 1.38.28 PM
Here, a read operation starts in between a write operation, and ends after the write operation has completed. Let’s say that N = 5, R = 3, and W = 3. In the following example we should be safe and have no inconsistencies. However if the read operation starts before the write finishes, and ends in the danger area, it is possible to get the old value. Let’s say the read request lands on the node that has the old value in the safe area. In the danger area, the read request lands on the lazily updated nodes before the data has propagated. Thus, tb reads the old value after the write has finished. The time that tb takes to perform the read is illustrated by the purple area in the graph above. The data is still considered correct, however, since the read has started before the write has finished. That verdict implies that, assuming R, W, and N are configured correctly, ta does not offer any guarantees about what the correct data should be before it has finished its write. Thus any read operation started before that may get either the old or the new data.

So, with DHTs, you are essentially able to toggle that window of the danger area. You can set your DHTs to be always consistent with W = N, or you can leave them to be inconsistent for a brief period of time with W < N, but guaranteeing that you will always know about it with R <= N – W. The data will become eventually consistent when it propagates to all N – W nodes. This inconsistency window is an example of practice diverging from theory, where it is theoretically possible to get stale results, but in practice it seldom matters. When dealing with entities such as “Big Data”, sometimes it might be more advantageous to provide availability and eventual consistency, rather than strong consistency but poor availability.

Notes on Git

January 15, 2013

pushing a remote branch to

git remote -v

Try to verify that origin is set to the correct thing.

git push origin branch-name

Hello World in LLVM

January 1, 2011

For some time now, I have been very interested in the compilation process of programming languages and how they are converted to assembly. Naturally, I became very interested in LLVM portable assembly syntax, and how its syntax compares to x86 assembly. It turns out that it is very readable and understandable, unlike the bare bones assembly found when disassembling a program. I found that it is actually a lot easier to analyze x86 assembly after it has been been generated from LLVM as intermediary since the LLVM block labels are commented into the assembly blocks and in general the structure looks very similar. Let us look at an example of a simple hello world program.

@0 = internal constant [20 x i8] c"Hello LLVM-C world!"

declare i32 @puts(i8*)

define void @sayHelloWorld() {
  %0 = call i32 @puts(i8* getelementptr inbounds ([20 x i8]* @0, i32 0, i32 0))
  ret void

Looking through the following example, the global @0 is set with a string. An external function where the body is defined somewhere else needs to be initialized with the declare keyword. A new function, however, needs to be defined along with its body, return type, and parameters. Each function body needs to have at least one label, which in this case is aName. After the block label, the call function is used which calls puts function from an external definition. The function getelementptr returns a pointer to an element specified with certain bounds. When the inbounds keyword is used, access is denied outside of the bounds specified. The register %0 is set with the result of external puts function, and a void is returned. Now I will present the equivalent x86 assembly.

        .section        __TEXT,__text,regular,pure_instructions
        .globl  _sayHelloWorld
        .align  4, 0x90
_sayHelloWorld:                         ## @sayHelloWorld
## BB#0:                                ## %aName
        subq    $8, %rsp
        leaq    ___unnamed_1(%rip), %rdi
        callq   _puts
        addq    $8, %rsp

        .section        __TEXT,__cstring,cstring_literals
        .align  4                       ## @0
        .asciz   "Hello LLVM-C world!"

Let us look at exactly what is going on. The stack pointer is advanced so that we may put local variables onto the stack. The string we would like to output is saved into %rdi register and _puts is called with the parameter. The stack pointer is returned to what it was originally and we return from the function.I think my favorite part about the following code is that LLVM has left us breadcrumbs which can give us insight into the interoperability of the x86 assembly. Particularly, we can see that @sayHelloWorld, %aName, and @0 entry block entry points are provided for us! Even though the following code snippet is not complex, for code blocks of greater complexity this information might be very important for us.

For reference, here is the C code necessary to generate and run the LLVM Hello World example.

what does “P == NP” mean?

December 10, 2010

A solution to a given algorithmic problem may be given in terms of a yes/no answer. NP is defined as polynomial running time necessary to verify that a solution is correct. P is defined as the actual running time to find whether there is a solution, which is polynomial. Therefore, it can easily be seen that something which runs in polynomial time could be easily verified in polynomial time as well, given an instance of the solution, so P is a subset of NP. However, does this mean that P == NP? There might be other algorithms that have different running time magnitudes, however which might be able to be verified in polynomial time.

Now, what does it mean for a problem to be NP-complete? It is a type of problem that could be verified in polynomial time, however there is no known algorithm that could solve the problem in polynomial time. There may be an algorithm, however it has never been discovered. If there is such an algorithm for at least one of these NP-complete problems, this means that there is an algorithm for all of them, and thus in fact all NP-complete problems are in the class P as well.

Therefore, if we know a problem that is already NP-Complete, we can prove another problem being NP-Complete as well. What this means is that if we know the running time of some problem that is NP-Complete, we know the running time of the other problem is at least as fast. So if there is a language L and we know language L’ is NP-Complete, we can reduce all inputs of of L’ to inputs of L such that when we feed input into L’, we have a transformation function f which reduces input instance x of L’ into input instance f(x) of L. From this, if the problem L gives a yes/no solution to input f(x), we know as well that L’ will give us the same answer based on input x. The five steps we must perform are

1. Prove L is an element of NP
2. Select a known NP-complete language L’
3. Describe an algorithm which computes a function f that maps every input instance x of L’ into an input instance f(x) of L
4. Prove that the input x is an element of L’ if and only if f(x) is an element of L
5. Prove that the algorithm necessary to generate f runs in polynomial time

Using Monads for State

March 30, 2010

One of the things that monads have been very useful for is State. Most of the time there is some intermediate structure that is not necessarily part of the result however it is needed in the intermediate calculation in order to come out with the final output. I was very lucky in finding Wadler’s Paper that explains monads by examples, which inspired me to create my own. Conveniently enough, I remembered a function that I wrote a while back for Project Euler where every number that is not the sum of two abundants is added up. The function takes as input a number to check as well as a list of all its known abundant numbers so far.

accumIfNotSumOfTwoAbundants (result,abundants) num
| isAbundant num == True  && isSumOfTwoAbundants abundants num == True  = 
| isAbundant num == False && isSumOfTwoAbundants abundants num == True  = 
| isAbundant num == True  && isSumOfTwoAbundants abundants num == False = 
| isAbundant num == False && isSumOfTwoAbundants abundants num == False = 

The function has a "side effect", in that it checks if the passed number is abundant before checking if the number is the sum of two previously known abundants. If the number itself is abundant, the number needs to be added to the currently passed list of abundants for the next round of calculation. Along with that, the result must be passed, which is the accumulated result so far plus the number passed if the number is sum of two abundants, otherwise only the accumulated result. So, how do we go about finding the result? In order to gain an appreciation of monads, let's not use folds but instead write the recursive calls manually, which will look something like

main = let (a,x) = accumIfNotSumOfTwoAbundants (0,[]) 12
           (b,y) = accumIfNotSumOfTwoAbundants (a,x)  13
           (c,z) = accumIfNotSumOfTwoAbundants (b,y)  14
        in putStrLn (show (c,z))

where c is the result and z is the intermediate list. Now the question is "can we abstract the list in a type since it is essentially global state", and the answer is yes we can! The inputs to the original function need to be slightly modified so that the list "state" is the last parameter being passed rather than being passed as a tuple along with the result. Let's define a new type now

type StateTrans s a = s -> (a,s)

As guessed, s is a generic state and a is the result. Now is the time to define the two famous functions, composition and return.

(>>>=) :: StateTrans s a -> (a -> StateTrans s b) -> StateTrans s b
p >>>= k = \s0 -> let (a,s1) = p s0
                      (b,s2) = k a s1
                   in (b,s2)

unit :: a -> StateTrans s a
unit a = \s -> (a,s)

Notice that I have defined composition as >>>= rather than >>= and return as unit to avoid compiler confusion. Now comes the main driver that pipes everything together.

main = let eval   = accumIfNotSumOfTwoAbundants 0 12 >>>=
             \a  -> accumIfNotSumOfTwoAbundants a 13 >>>=
             \b  -> accumIfNotSumOfTwoAbundants b 14 >>>=
             \c  -> unit c
        in putStrLn . show $ eval []

As noticed, the intermediate list does not need to be passed around every single time, only the result. The initial empty list is passed only once to the function eval. However, the function eval does not take any parameters, so how can we pass an extra parameter? Due to the magic of partial application it turns out that when we call accumIfNotSumOfTwoAbundants 0 12, it is actually a function with type [Integer] -> (Integer, [Integer]). Wow, that looks exactly like our type that we defined earlier! Well, it is just so convenient that it is the first parameter of input of >>>= function as well. That means that p in this case is really the expression accumIfNotSumOfTwoAbundants 0 12. With the combined power of lazy evaluation and partial application s0 does not need to be extracted right away and is instead a thunk that will be evaluated later. So then k a is really the anonymous function \a -> accumIfNotSumOfTwoAbundants a 13. Notice that here a is the result of the first function which is the first parameter to the anonymous function as well. Here, s1 is the intermediate state that came as a result of calculating the first expression, which is then passed to k a, which in turn becomes (\a -> accumIfNotSumOfTwoAbundants a 13) s1. As can be observed, multiple functions may be strung together in such a fashion, in a very similar fashion to unix pipes. It is time to talk about the function unit. It is not necessary in the following example and writing main without unit call would yield same result as with it. However it is good practice to have it since you know exactly the value that you are returning. Interesting to note is that unit may return an intermediate value as well such as

main = let eval   = accumIfNotSumOfTwoAbundants 0 12 >>>=
             \a  -> accumIfNotSumOfTwoAbundants a 13 >>>=
             \b  -> accumIfNotSumOfTwoAbundants b 14 >>>=
             \c  -> unit a
        in putStrLn . show $ eval []

The main thing to realize is that unit takes as a parameter the intermediate, and raises it to the monad which in this case is (result,state). From here on out an actual instance of a monad may be used so that main may be rewritten in a do style notation.

In order to create an instance of Monad, we will have to create a newtype and cannot use type synonym anymore.

newtype StateTrans s a = ST( s -> (a,s) )

The new monad instance will look such as

instance Monad (StateTrans s)
    (ST p) >>= k = ST( \s0 -> let (a,s1) = p s0
                                  (ST q) = k a
                              in q s1 )
    return a = ST( \s -> (a,s) ) 

The ST function essentially acts as a lifting function which transforms into StateTrans type. Another function is necessary, applyST, which drops the result into something usable by other functions.

applyST :: StateTrans s a -> s -> (a,s)
applyST (ST p) s = p s            

Now, we are able to write the main portion of our code in a style that is similar to C syntax.

main = let eval = do 
                a <- ST (accumIfNotSumOfTwoAbundants 0 12);
                b <- ST (accumIfNotSumOfTwoAbundants a 13);
                c <- ST (accumIfNotSumOfTwoAbundants b 14);
                return c
       in putStrLn . show $ applyST eval []

Reference: Haskell and Monads

plotting with R

November 24, 2008

R is a statistical plotting language, similar to Matlab. It’s not very user-friendly though, especially when you try to use it for the first time. In my class, I needed to plot two functions in the same graph, for Voltage and Current through an RL circuit. Here is what I used in order to accomplish that:

> t = seq(-2*pi,2*pi, length=1000)

> f = 4*cos(2*t)

> g = 0.398*cos(2*t-1.466)

> plot(c(t,t), c(f,g), type=”n”, xlab=”t”, ylab=”y”)

> lines(t,g, col=”red”)

> lines(t,f, col=”blue”,lty=2)

> legend(-2.636377,3.731795,”y=0.398cos2t-84″,lty=1,col=”red”)

> legend(-2.636377,4.264303, “y=4cos2t”,lty=2,col=”blue”)

The result I got was

plotting in R

plotting in R

Preserving your OpenAFS installation when switching IP addresses

March 30, 2008

About a week ago during spring break, I actually got OpenAFS running on an old computer that someone gave to me. There are two parts to the configuration, the client and the server. Getting the installation to properly run on the server was a challenge, but it worked. The client on the actual server was working, however when I tried to make it work with another separate computer, it totally crapped out. After a long waiting period in an IRC chatroom, it was noted that I had my /etc/hosts file totally wrong, and the file is not the same for the client and the server. For future reference, this is what the file needs to look like:                localhost

I was on a private network, so I just used ecks-net.priv for the network.ext string. However, whenever I used the vos listvldb command, it was showing me wrong output. Even when I fixed my /etc/hosts file it was showing the incorrect IP address. Apperently OpenAFS keeps a database of the server’s ip address taken from /etc/hosts upon initialization of the software, and keeps that in there no matter if you change your /etc/hosts file or IP address. If something like that happens, or for some reason the client registers the incorrect ip address, this is what you will need to do at the server:

vos syncvldb -server `hostname`  -verbose -localauth
vos syncserv -server `hostname`  -verbose -localauth
vos changeaddr localhost -remove -local
vos listaddr -nore
/etc/init.d/bos shutdown
cd /var/lib/openafs/DB
rm vldb.DB0 vldb.DBSYS1
cd /var/lib/openafs
rm sysid sysid.old
cd /var/log/openafs
rm *
/etc/init.d/openafs-server start
vos listvldb
vos syncvldb `hostname` -local- verbose
vos listvldb

Your vldb should now contain the correct hostname. What we did is we deleted the database file, then regenerated it back again. Here is a reference to the actual IRC log, you can track down the conversation by looking for the “noecksit” string. This is very useful when you change your static IP address and have to reconfigure OpenAFS with the new IP address.

A tic-tac-toe implementation using haskell

March 30, 2008

After about half a year, I can finally say that I am comfortable using haskell as a language. Dealing with the functional part, the biggest stumbling block for me were the Data types, just because you cannot declare Data types in other languages as you do in haskell. In languages like Java or Python you can’t really really declare your own Data types. You can create classes, but that is different since they may not be just containers, they may change the outside world somehow. In haskell, a Data type is just that, a container, kind of like a slot for some value. That value may be restricted to a certain subset, or it may be a special kind of slot (for instance, a tree), but it cannot be executed to change some state outside of itself. The other thing that I haven’t seen before anywhere else is the use of higher-order functions, and mainly being able to pass a function as a parameter to another function. I have never seen this property in math either, but I’m sure it exists, I just haven’t taken such advanced courses yet. That leads me to my topic, which is an AI agent for a tic-tac-toe implementation that I made using haskell. It is available at hackage if anyone wants to try it out. I used two references when I was coding the project, Principles of Artificial Intelligence by Nils Nilsson, and the paper Why Functional Programming Matters by John Hughes. Both of these were excellent references, especially the paper. The program makes extensive use of higher-order functions and function composition. It uses minimax and alpha-beta pruning for its algorithm. Using these functions, I could test out certain functionalities one by one. What I used to do before in imperative languages was print statements, which would get very messy. Here I can just test the input of a function, and as long as it returns the desired output for all my tests, I don’t have to worry about that function at all. That lets me concentrate on one problem, without thinking about other functions introducing some errors. I also used maps and zips in a nice way. I first made a function that given a board, returns a list of all the possibilities of that board, calling it moves. After that, I made a recursive function reptree that in this case takes moves (a function!) as parameter, and a Tree with only one root, mainly the board that we wanted to pass into moves. The reptree function constructs another tree with the board as root, and since the children of the root is a list, it maps reptree moves over the list generated by moves called with the root board as parameter. Therefore, it becomes a recursive function that given a certain board as parameter, can generate a tree of all the possible possibilities ever to occur! OK, that however generates only the board moves, what I really want the actual values of the moves. Therefore I have a function getValue, that for a certain board, outputs a number and that’s all it does. How it actually generates this number is explained in the comments of the source code. All I do then is map the function getValue over the tree that I generated and I get as output a tree of the values. I can then just simply feed that output to maximise which would return to me the most advantageous value. One thing that the paper doesn’t mention and something I didn’t find anywhere is how to get the actual board, since you get a value from maximise but you can’t really do anything with it, all it tells you is the value of the best move, but the actual move is lost. Therefore, I just made a tree zipper that zips the two trees together and returns a tree of tuples consisting of the value and the actual board. We then feed the tuple to maximise, with it only looking at the first value, but actually returning a tuple of the value and the best move. That gives you a best move that the agent should make some time in the future however, considering both the player and the enemy play optimally. We need the best move right for the next move that we need to make, not the move a couple of plies down the line. It’s good thing that I provide a list of history that tells us what moves the board has already generated, so I just backtrack to one move after the current move and I have my best move for the next ply. My code is still not as pretty as I would like it to be, due to a couple of hacks I had to make, but it is a lot better than what I would have had to go through if I wrote this in an imperative language.

gentoo-haskell ebuild

December 25, 2007

I just got done with another semester of college. It was tougher than usual since I had 6 classes. With the little time that I have had, I’ve been intently trying to learn Haskell, which is nothing like I have seen before, since it is purely functional, unlike all the other imperative languages I’m used to. It seems promising, but will probably take a while to fully grasp it. In trying to learn the languages, I’ve thought about making libmpd-haskell work with xmonad to play and stop music with a button. Midway through, someone changed something some feature in xmonad, and it stopped working anymore. Rather than try to fix this in a language I don’t know, I made a nice ebuild for gentoo instead.

Emulating minix using bochs on linux

September 17, 2007

First of all, let me provide an explanation for what minix is and why it may be useful. I remember hearing about minix a long time ago when I was reading Just For Fun: The Story of an Accidental Revolution, where Linux Torvalds and Andrew Tanenbaum, creator of Minix, argued which OS would survive. Linux originated from minix, and even though they are both derived from unix, they differ greatly how they function on the inside, with the most notable feature being the kernel. Linux is monolithic, with all the drivers running in kernel-space, meaning a bug in some driver could crash your computer easily, whereas Minix uses a client-server model for its kernel, where drivers are run in user-space. This introduces some latency, but also introduces reliability, since a driver crash will be the equivalent of a user program crashing, which is harmless to the system. The easiest way to check it out is to put it on a virtual machine and let it run. I chose bochs because it was the easiest to install (qemu wanted me to have a different compiler), and it seems to be the most suitable for testing OS code. The biggest downside is that everything is emulated, which makes it rather slow to run. For regular file editing and browsing it’s fine, but when it has to write to a disk, it can take significantly longer than running it bare on the machine. The hardest part for me was getting minix to run, so I will explain it in detail. Once you install/compile bochs on your HDD, it will look for a .bochsrc file in your user directory (/home/username or ~). You can acquire minix through downloading from their website, or you may already have a CD-ROM with minix, as in my case. In our .bochsrc file, we will first need to specify a romimage and vgaromimage that bochs can load. Bochs comes with a default one, I used it and it was fine. They’re located usually under /usr/share/bochs, but may be different depending on your system. The names of the files are BIOS-bochs-latest and VGABIOS-lgpl-latest. The first two lines, if you plan to use the default images, should be:

romimage: file=/usr/share/bochs/BIOS-bochs-latest, address=0xf0000
vgaromimage: file=/usr/share/bochs/VGABIOS-lgpl-latest

Now is the time to emulate a CD-ROM from which to boot minix. The power of bochs lies it the fact that it can very easily emulate devices, and they can be pretty much any file that you specify. So, in the case that you are booting from a CD-ROM, and the name of the device is /dev/cdrom (you may have to adjust), your line to add would be

ata0-master: type=cdrom, path=/dev/cdrom, status=inserted

In the case that you have only the minix iso file, the line to add would be

ata0-master: type=cdrom, path=/path/to/iso/file, status=inserted

For whichever type you use, at the end you will need to specify to boot from cdrom:

boot: disk

Now you should try running minix by typing “bochs” at the prompt. You will see the Main Menu of Bochs. Specify [5] (the default) to get on with the simulation. It should present you with a nice bootup screen and after the OS loads, you should be able to login with “root”. This is a live CD so you can explore the structure and all from the CD image directly. Of course, you’d want to install it to disk to be able to change minix around and save its current state. In order to do that, you will need to save the OS to disk. In order to do that, you will need to make an image file on your HDD, which acts as a partition for your virtual machine. You can use dd for that, but in order to make it easier, bximage should be used, which comes with bochs. After starting it up, you should see

Do you want to create a floppy disk image or a hard disk image?
Please type hd or fd. [hd]

Specify the default, [hd]. After that you would get

What kind of image should I create?
Please type flat, sparse or growing. [flat]

The default, [flat], should be specified again. You should now see

Enter the hard disk size in megabytes, between 1 and 129023

Minix does not need a lot of space to run, however I read somewhere that you should allocate 1 GB in order to install anything you want on there, so I specified I typed in 1024 MB, and if you have the space and plan to add packages, should too. The minix website specifies that for minimum you will need 50 MB, and if you want to have all the sources, you will need 600 MB. After you specify how much space you want to allocate, the program will give a diagnostic message of what it will do, and ask you to specify the name of the image

What should I name the image?

I chose the name “minix.img” but it’s up to you what you want to name it. After writing to the HDD, it will print out a line which you should add to your .bochsrc file

ata0-master: type=disk, path="minix.img", mode=flat, cylinders=cn, heads=hn, spt=sn

You can add that line, but you will need to change ata0-master to ata1-master, considering you still have the line where your ata0-master corresponds to your CD-ROM image, and change the boot: parameter to point to disk. This is what my final .bochsrc file looked like

romimage: file=/usr/share/bochs/BIOS-bochs-latest, address=0xf0000
vgaromimage: file=/usr/share/bochs/VGABIOS-lgpl-latest
ata0-master: type=cdrom, path=/dev/cdrom, status=inserted
ata1-master: type=disk, path="minix.img", mode=flat, cylinders=2080, heads=16, spt=63
boot: cdrom

From there, running minix and setting it up on your HDD should be a breeze. After logging in with “root”, run setup and accept all the defaults. There should be a partition that you can now install to. After a considerate amount of time of installing packages (considering you did a Full Install), you should get a screen telling you that everything succeeded. After that, exit bochs and just change the boot: parameter to point to disk and you are done. Next time you run the simulator, it should boot from the image file and you should be presented with you brand new minix installation.


Get every new post delivered to your Inbox.