Notes on Git

January 15, 2013

pushing a remote branch to github.com:

git remote -v

Try to verify that origin is set to the correct thing.

git push origin branch-name

Hello World in LLVM

January 1, 2011

For some time now, I have been very interested in the compilation process of programming languages and how they are converted to assembly. Naturally, I became very interested in LLVM portable assembly syntax, and how its syntax compares to x86 assembly. It turns out that it is very readable and understandable, unlike the bare bones assembly found when disassembling a program. I found that it is actually a lot easier to analyze x86 assembly after it has been been generated from LLVM as intermediary since the LLVM block labels are commented into the assembly blocks and in general the structure looks very similar. Let us look at an example of a simple hello world program.


@0 = internal constant [20 x i8] c"Hello LLVM-C world!"

declare i32 @puts(i8*)

define void @sayHelloWorld() {
aName:
  %0 = call i32 @puts(i8* getelementptr inbounds ([20 x i8]* @0, i32 0, i32 0))
  ret void
}

Looking through the following example, the global @0 is set with a string. An external function where the body is defined somewhere else needs to be initialized with the declare keyword. A new function, however, needs to be defined along with its body, return type, and parameters. Each function body needs to have at least one label, which in this case is aName. After the block label, the call function is used which calls puts function from an external definition. The function getelementptr returns a pointer to an element specified with certain bounds. When the inbounds keyword is used, access is denied outside of the bounds specified. The register %0 is set with the result of external puts function, and a void is returned. Now I will present the equivalent x86 assembly.


        .section        __TEXT,__text,regular,pure_instructions
        .globl  _sayHelloWorld
        .align  4, 0x90
_sayHelloWorld:                         ## @sayHelloWorld
Leh_func_begin0:
## BB#0:                                ## %aName
        subq    $8, %rsp
Ltmp0:
        leaq    ___unnamed_1(%rip), %rdi
        callq   _puts
        addq    $8, %rsp
        ret
Leh_func_end0:

        .section        __TEXT,__cstring,cstring_literals
        .align  4                       ## @0
___unnamed_1:
        .asciz   "Hello LLVM-C world!"

Let us look at exactly what is going on. The stack pointer is advanced so that we may put local variables onto the stack. The string we would like to output is saved into %rdi register and _puts is called with the parameter. The stack pointer is returned to what it was originally and we return from the function.I think my favorite part about the following code is that LLVM has left us breadcrumbs which can give us insight into the interoperability of the x86 assembly. Particularly, we can see that @sayHelloWorld, %aName, and @0 entry block entry points are provided for us! Even though the following code snippet is not complex, for code blocks of greater complexity this information might be very important for us.

For reference, here is the C code necessary to generate and run the LLVM Hello World example.

what does “P == NP” mean?

December 10, 2010

A solution to a given algorithmic problem may be given in terms of a yes/no answer. NP is defined as polynomial running time necessary to verify that a solution is correct. P is defined as the actual running time to find whether there is a solution, which is polynomial. Therefore, it can easily be seen that something which runs in polynomial time could be easily verified in polynomial time as well, given an instance of the solution, so P is a subset of NP. However, does this mean that P == NP? There might be other algorithms that have different running time magnitudes, however which might be able to be verified in polynomial time.

Now, what does it mean for a problem to be NP-complete? It is a type of problem that could be verified in polynomial time, however there is no known algorithm that could solve the problem in polynomial time. There may be an algorithm, however it has never been discovered. If there is such an algorithm for at least one of these NP-complete problems, this means that there is an algorithm for all of them, and thus in fact all NP-complete problems are in the class P as well.

Therefore, if we know a problem that is already NP-Complete, we can prove another problem being NP-Complete as well. What this means is that if we know the running time of some problem that is NP-Complete, we know the running time of the other problem is at least as fast. So if there is a language L and we know language L’ is NP-Complete, we can reduce all inputs of of L’ to inputs of L such that when we feed input into L’, we have a transformation function f which reduces input instance x of L’ into input instance f(x) of L. From this, if the problem L gives a yes/no solution to input f(x), we know as well that L’ will give us the same answer based on input x. The five steps we must perform are

1. Prove L is an element of NP
2. Select a known NP-complete language L’
3. Describe an algorithm which computes a function f that maps every input instance x of L’ into an input instance f(x) of L
4. Prove that the input x is an element of L’ if and only if f(x) is an element of L
5. Prove that the algorithm necessary to generate f runs in polynomial time

Using Monads for State

March 30, 2010

One of the things that monads have been very useful for is State. Most of the time there is some intermediate structure that is not necessarily part of the result however it is needed in the intermediate calculation in order to come out with the final output. I was very lucky in finding Wadler’s Paper that explains monads by examples, which inspired me to create my own. Conveniently enough, I remembered a function that I wrote a while back for Project Euler where every number that is not the sum of two abundants is added up. The function takes as input a number to check as well as a list of all its known abundant numbers so far.


accumIfNotSumOfTwoAbundants (result,abundants) num
| isAbundant num == True  && isSumOfTwoAbundants abundants num == True  = 
  (result,num:abundants)
| isAbundant num == False && isSumOfTwoAbundants abundants num == True  = 
  (result,abundants)
| isAbundant num == True  && isSumOfTwoAbundants abundants num == False = 
  (num+result,num:abundants)
| isAbundant num == False && isSumOfTwoAbundants abundants num == False = 
  (num+result,abundants)


The function has a "side effect", in that it checks if the passed number is abundant before checking if the number is the sum of two previously known abundants. If the number itself is abundant, the number needs to be added to the currently passed list of abundants for the next round of calculation. Along with that, the result must be passed, which is the accumulated result so far plus the number passed if the number is sum of two abundants, otherwise only the accumulated result. So, how do we go about finding the result? In order to gain an appreciation of monads, let's not use folds but instead write the recursive calls manually, which will look something like


main = let (a,x) = accumIfNotSumOfTwoAbundants (0,[]) 12
           (b,y) = accumIfNotSumOfTwoAbundants (a,x)  13
           (c,z) = accumIfNotSumOfTwoAbundants (b,y)  14
        in putStrLn (show (c,z))


where c is the result and z is the intermediate list. Now the question is "can we abstract the list in a type since it is essentially global state", and the answer is yes we can! The inputs to the original function need to be slightly modified so that the list "state" is the last parameter being passed rather than being passed as a tuple along with the result. Let's define a new type now


type StateTrans s a = s -> (a,s)


As guessed, s is a generic state and a is the result. Now is the time to define the two famous functions, composition and return.


(>>>=) :: StateTrans s a -> (a -> StateTrans s b) -> StateTrans s b
p >>>= k = \s0 -> let (a,s1) = p s0
                      (b,s2) = k a s1
                   in (b,s2)

unit :: a -> StateTrans s a
unit a = \s -> (a,s)


Notice that I have defined composition as >>>= rather than >>= and return as unit to avoid compiler confusion. Now comes the main driver that pipes everything together.


main = let eval   = accumIfNotSumOfTwoAbundants 0 12 >>>=
             \a  -> accumIfNotSumOfTwoAbundants a 13 >>>=
             \b  -> accumIfNotSumOfTwoAbundants b 14 >>>=
             \c  -> unit c
        in putStrLn . show $ eval []


As noticed, the intermediate list does not need to be passed around every single time, only the result. The initial empty list is passed only once to the function eval. However, the function eval does not take any parameters, so how can we pass an extra parameter? Due to the magic of partial application it turns out that when we call accumIfNotSumOfTwoAbundants 0 12, it is actually a function with type [Integer] -> (Integer, [Integer]). Wow, that looks exactly like our type that we defined earlier! Well, it is just so convenient that it is the first parameter of input of >>>= function as well. That means that p in this case is really the expression accumIfNotSumOfTwoAbundants 0 12. With the combined power of lazy evaluation and partial application s0 does not need to be extracted right away and is instead a thunk that will be evaluated later. So then k a is really the anonymous function \a -> accumIfNotSumOfTwoAbundants a 13. Notice that here a is the result of the first function which is the first parameter to the anonymous function as well. Here, s1 is the intermediate state that came as a result of calculating the first expression, which is then passed to k a, which in turn becomes (\a -> accumIfNotSumOfTwoAbundants a 13) s1. As can be observed, multiple functions may be strung together in such a fashion, in a very similar fashion to unix pipes. It is time to talk about the function unit. It is not necessary in the following example and writing main without unit call would yield same result as with it. However it is good practice to have it since you know exactly the value that you are returning. Interesting to note is that unit may return an intermediate value as well such as


main = let eval   = accumIfNotSumOfTwoAbundants 0 12 >>>=
             \a  -> accumIfNotSumOfTwoAbundants a 13 >>>=
             \b  -> accumIfNotSumOfTwoAbundants b 14 >>>=
             \c  -> unit a
        in putStrLn . show $ eval []


The main thing to realize is that unit takes as a parameter the intermediate, and raises it to the monad which in this case is (result,state). From here on out an actual instance of a monad may be used so that main may be rewritten in a do style notation.

In order to create an instance of Monad, we will have to create a newtype and cannot use type synonym anymore.


newtype StateTrans s a = ST( s -> (a,s) )


The new monad instance will look such as


instance Monad (StateTrans s)
  where
    (ST p) >>= k = ST( \s0 -> let (a,s1) = p s0
                                  (ST q) = k a
                              in q s1 )
                                  
    return a = ST( \s -> (a,s) ) 


The ST function essentially acts as a lifting function which transforms into StateTrans type. Another function is necessary, applyST, which drops the result into something usable by other functions.


applyST :: StateTrans s a -> s -> (a,s)
applyST (ST p) s = p s            


Now, we are able to write the main portion of our code in a style that is similar to C syntax.


main = let eval = do 
       {
                a <- ST (accumIfNotSumOfTwoAbundants 0 12);
                b <- ST (accumIfNotSumOfTwoAbundants a 13);
                c <- ST (accumIfNotSumOfTwoAbundants b 14);
                return c
       }        
       in putStrLn . show $ applyST eval []


Reference: Haskell and Monads

plotting with R

November 24, 2008

R is a statistical plotting language, similar to Matlab. It’s not very user-friendly though, especially when you try to use it for the first time. In my class, I needed to plot two functions in the same graph, for Voltage and Current through an RL circuit. Here is what I used in order to accomplish that:

> t = seq(-2*pi,2*pi, length=1000)

> f = 4*cos(2*t)

> g = 0.398*cos(2*t-1.466)

> plot(c(t,t), c(f,g), type=”n”, xlab=”t”, ylab=”y”)

> lines(t,g, col=”red”)

> lines(t,f, col=”blue”,lty=2)

> legend(-2.636377,3.731795,”y=0.398cos2t-84″,lty=1,col=”red”)

> legend(-2.636377,4.264303, “y=4cos2t”,lty=2,col=”blue”)

The result I got was

plotting in R

plotting in R

Preserving your OpenAFS installation when switching IP addresses

March 30, 2008

About a week ago during spring break, I actually got OpenAFS running on an old computer that someone gave to me. There are two parts to the configuration, the client and the server. Getting the installation to properly run on the server was a challenge, but it worked. The client on the actual server was working, however when I tried to make it work with another separate computer, it totally crapped out. After a long waiting period in an IRC chatroom, it was noted that I had my /etc/hosts file totally wrong, and the file is not the same for the client and the server. For future reference, this is what the file needs to look like:


127.0.0.1                localhost
your.static.ip.addr      hostname.network.ext

I was on a private network, so I just used ecks-net.priv for the network.ext string. However, whenever I used the vos listvldb command, it was showing me wrong output. Even when I fixed my /etc/hosts file it was showing the incorrect IP address. Apperently OpenAFS keeps a database of the server’s ip address taken from /etc/hosts upon initialization of the software, and keeps that in there no matter if you change your /etc/hosts file or IP address. If something like that happens, or for some reason the client registers the incorrect ip address, this is what you will need to do at the server:


vos syncvldb -server `hostname`  -verbose -localauth
vos syncserv -server `hostname`  -verbose -localauth
vos changeaddr localhost -remove -local
vos listaddr -nore
/etc/init.d/bos shutdown
cd /var/lib/openafs/DB
rm vldb.DB0 vldb.DBSYS1
cd /var/lib/openafs
rm sysid sysid.old
cd /var/log/openafs
rm *
/etc/init.d/openafs-server start
vos listvldb
vos syncvldb `hostname` -local- verbose
vos listvldb

Your vldb should now contain the correct hostname. What we did is we deleted the database file, then regenerated it back again. Here is a reference to the actual IRC log, you can track down the conversation by looking for the “noecksit” string. This is very useful when you change your static IP address and have to reconfigure OpenAFS with the new IP address.

A tic-tac-toe implementation using haskell

March 30, 2008

After about half a year, I can finally say that I am comfortable using haskell as a language. Dealing with the functional part, the biggest stumbling block for me were the Data types, just because you cannot declare Data types in other languages as you do in haskell. In languages like Java or Python you can’t really really declare your own Data types. You can create classes, but that is different since they may not be just containers, they may change the outside world somehow. In haskell, a Data type is just that, a container, kind of like a slot for some value. That value may be restricted to a certain subset, or it may be a special kind of slot (for instance, a tree), but it cannot be executed to change some state outside of itself. The other thing that I haven’t seen before anywhere else is the use of higher-order functions, and mainly being able to pass a function as a parameter to another function. I have never seen this property in math either, but I’m sure it exists, I just haven’t taken such advanced courses yet. That leads me to my topic, which is an AI agent for a tic-tac-toe implementation that I made using haskell. It is available at hackage if anyone wants to try it out. I used two references when I was coding the project, Principles of Artificial Intelligence by Nils Nilsson, and the paper Why Functional Programming Matters by John Hughes. Both of these were excellent references, especially the paper. The program makes extensive use of higher-order functions and function composition. It uses minimax and alpha-beta pruning for its algorithm. Using these functions, I could test out certain functionalities one by one. What I used to do before in imperative languages was print statements, which would get very messy. Here I can just test the input of a function, and as long as it returns the desired output for all my tests, I don’t have to worry about that function at all. That lets me concentrate on one problem, without thinking about other functions introducing some errors. I also used maps and zips in a nice way. I first made a function that given a board, returns a list of all the possibilities of that board, calling it moves. After that, I made a recursive function reptree that in this case takes moves (a function!) as parameter, and a Tree with only one root, mainly the board that we wanted to pass into moves. The reptree function constructs another tree with the board as root, and since the children of the root is a list, it maps reptree moves over the list generated by moves called with the root board as parameter. Therefore, it becomes a recursive function that given a certain board as parameter, can generate a tree of all the possible possibilities ever to occur! OK, that however generates only the board moves, what I really want the actual values of the moves. Therefore I have a function getValue, that for a certain board, outputs a number and that’s all it does. How it actually generates this number is explained in the comments of the source code. All I do then is map the function getValue over the tree that I generated and I get as output a tree of the values. I can then just simply feed that output to maximise which would return to me the most advantageous value. One thing that the paper doesn’t mention and something I didn’t find anywhere is how to get the actual board, since you get a value from maximise but you can’t really do anything with it, all it tells you is the value of the best move, but the actual move is lost. Therefore, I just made a tree zipper that zips the two trees together and returns a tree of tuples consisting of the value and the actual board. We then feed the tuple to maximise, with it only looking at the first value, but actually returning a tuple of the value and the best move. That gives you a best move that the agent should make some time in the future however, considering both the player and the enemy play optimally. We need the best move right for the next move that we need to make, not the move a couple of plies down the line. It’s good thing that I provide a list of history that tells us what moves the board has already generated, so I just backtrack to one move after the current move and I have my best move for the next ply. My code is still not as pretty as I would like it to be, due to a couple of hacks I had to make, but it is a lot better than what I would have had to go through if I wrote this in an imperative language.

gentoo-haskell ebuild

December 25, 2007

I just got done with another semester of college. It was tougher than usual since I had 6 classes. With the little time that I have had, I’ve been intently trying to learn Haskell, which is nothing like I have seen before, since it is purely functional, unlike all the other imperative languages I’m used to. It seems promising, but will probably take a while to fully grasp it. In trying to learn the languages, I’ve thought about making libmpd-haskell work with xmonad to play and stop music with a button. Midway through, someone changed something some feature in xmonad, and it stopped working anymore. Rather than try to fix this in a language I don’t know, I made a nice ebuild for gentoo instead.

Emulating minix using bochs on linux

September 17, 2007

First of all, let me provide an explanation for what minix is and why it may be useful. I remember hearing about minix a long time ago when I was reading Just For Fun: The Story of an Accidental Revolution, where Linux Torvalds and Andrew Tanenbaum, creator of Minix, argued which OS would survive. Linux originated from minix, and even though they are both derived from unix, they differ greatly how they function on the inside, with the most notable feature being the kernel. Linux is monolithic, with all the drivers running in kernel-space, meaning a bug in some driver could crash your computer easily, whereas Minix uses a client-server model for its kernel, where drivers are run in user-space. This introduces some latency, but also introduces reliability, since a driver crash will be the equivalent of a user program crashing, which is harmless to the system. The easiest way to check it out is to put it on a virtual machine and let it run. I chose bochs because it was the easiest to install (qemu wanted me to have a different compiler), and it seems to be the most suitable for testing OS code. The biggest downside is that everything is emulated, which makes it rather slow to run. For regular file editing and browsing it’s fine, but when it has to write to a disk, it can take significantly longer than running it bare on the machine. The hardest part for me was getting minix to run, so I will explain it in detail. Once you install/compile bochs on your HDD, it will look for a .bochsrc file in your user directory (/home/username or ~). You can acquire minix through downloading from their website, or you may already have a CD-ROM with minix, as in my case. In our .bochsrc file, we will first need to specify a romimage and vgaromimage that bochs can load. Bochs comes with a default one, I used it and it was fine. They’re located usually under /usr/share/bochs, but may be different depending on your system. The names of the files are BIOS-bochs-latest and VGABIOS-lgpl-latest. The first two lines, if you plan to use the default images, should be:

romimage: file=/usr/share/bochs/BIOS-bochs-latest, address=0xf0000
vgaromimage: file=/usr/share/bochs/VGABIOS-lgpl-latest

Now is the time to emulate a CD-ROM from which to boot minix. The power of bochs lies it the fact that it can very easily emulate devices, and they can be pretty much any file that you specify. So, in the case that you are booting from a CD-ROM, and the name of the device is /dev/cdrom (you may have to adjust), your line to add would be

ata0-master: type=cdrom, path=/dev/cdrom, status=inserted

In the case that you have only the minix iso file, the line to add would be

ata0-master: type=cdrom, path=/path/to/iso/file, status=inserted

For whichever type you use, at the end you will need to specify to boot from cdrom:

boot: disk

Now you should try running minix by typing “bochs” at the prompt. You will see the Main Menu of Bochs. Specify [5] (the default) to get on with the simulation. It should present you with a nice bootup screen and after the OS loads, you should be able to login with “root”. This is a live CD so you can explore the structure and all from the CD image directly. Of course, you’d want to install it to disk to be able to change minix around and save its current state. In order to do that, you will need to save the OS to disk. In order to do that, you will need to make an image file on your HDD, which acts as a partition for your virtual machine. You can use dd for that, but in order to make it easier, bximage should be used, which comes with bochs. After starting it up, you should see


Do you want to create a floppy disk image or a hard disk image?
Please type hd or fd. [hd]

Specify the default, [hd]. After that you would get


What kind of image should I create?
Please type flat, sparse or growing. [flat]

The default, [flat], should be specified again. You should now see


Enter the hard disk size in megabytes, between 1 and 129023
[10]

Minix does not need a lot of space to run, however I read somewhere that you should allocate 1 GB in order to install anything you want on there, so I specified I typed in 1024 MB, and if you have the space and plan to add packages, should too. The minix website specifies that for minimum you will need 50 MB, and if you want to have all the sources, you will need 600 MB. After you specify how much space you want to allocate, the program will give a diagnostic message of what it will do, and ask you to specify the name of the image


What should I name the image?
[c.img]

I chose the name “minix.img” but it’s up to you what you want to name it. After writing to the HDD, it will print out a line which you should add to your .bochsrc file


ata0-master: type=disk, path="minix.img", mode=flat, cylinders=cn, heads=hn, spt=sn

You can add that line, but you will need to change ata0-master to ata1-master, considering you still have the line where your ata0-master corresponds to your CD-ROM image, and change the boot: parameter to point to disk. This is what my final .bochsrc file looked like

romimage: file=/usr/share/bochs/BIOS-bochs-latest, address=0xf0000
vgaromimage: file=/usr/share/bochs/VGABIOS-lgpl-latest
ata0-master: type=cdrom, path=/dev/cdrom, status=inserted
ata1-master: type=disk, path="minix.img", mode=flat, cylinders=2080, heads=16, spt=63
boot: cdrom

From there, running minix and setting it up on your HDD should be a breeze. After logging in with “root”, run setup and accept all the defaults. There should be a partition that you can now install to. After a considerate amount of time of installing packages (considering you did a Full Install), you should get a screen telling you that everything succeeded. After that, exit bochs and just change the boot: parameter to point to disk and you are done. Next time you run the simulator, it should boot from the image file and you should be presented with you brand new minix installation.

firefox and dns security through ssh tunnel

September 6, 2007

Thanks to tor and a packet sniffer, I was able to conclude that firefox by default leaks its dns requests when using a SOCKS v5 proxy. Therefore, what happens is that even though the rest of the traffic is encrypted, the actual requests for the names are leaked through, and if some knowledgeable administrator looks at the logs, can conclude that we are encrypting our traffic. For instance, a request for facebook.com would look like this:


the content of this site should be blocked by the http proxy
DNS REQUEST: www.facebook.com

SSH ENCRYPTED TRAFFIC

where this is usually blocked, this file should not have been able to be accessed
DNS REQUEST: www.facebook.com/profile.php

To get rid of this disparity, we need to change some settings in firefox. At the url bar type in about:config, after which look for network.proxy.socks_remote_dns and change it from default, which is set to false, to user set which is set to true. In such a way, your dns traffic will be routed through the socks proxy rather than leaked through.


Follow

Get every new post delivered to your Inbox.