I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

mobifs, part two: aggfs

playing with the idea of splitting files, mapping their tree structure to the filesystem, I came to the conclusion that the opposite of the pattern could be more than useful: aggregate filesystem.

What and Why? Just simply imagine a large project with hundreds of files, where you would have to do a huge huge huge huge refactoring that would affect more than 50% of files. (renaming for example). Opening each file is out of question. You could write some sed scripts to do the job. But why? Why you would not be able to open all the files in vi or you favourite editor, first do some analysis by searching for occurrences, then mass replace, etc, etc.

The aggregate filesystem should have some configuration file that would define what is aggregated by listing the files or defining rules (regular expressions for example, globing) about what kind of files to be included (for example all files under a directory). You would end up with a huge file that would be composed by all the files (in random order). End of files of each original file would be marked with a special line, I will call it pivot line. These pivot lines would unambiguously identify the end of file. Removing such a line would mean that the content would flow to the file that follows. The pivot line would also contain the full path of the original file.

When you open an aggregate file and start to read it the aggregate fs would open all the aggregated files (or eventually define other strategies). If you would read the file, the aggregate fs would read one by one all the aggregated files. When saving an aggregated file, the aggregate fs would save each part of the content to the original file indicated by the pivot line.

Such pivot line would also help to create commands inside your editor that would have eventually affect on the original file. For example open the original file.

Such a filesystem could be implemented both in kernel and userspace with fuse.

some git naming best practices

I have git repositories in several places. Considering different forces I came to some personal conventions, that will, I hope help me to pull push from/to the correct place. I have for all the git repositories basically 4 categories of locals/remotes:

1. the working area. The working area can be on different machines, therefore I call these remotes as the machine name as well. It makes sense to have such remotes, as working on machine X, the Y will be a remote to X. So on each copy/mirror of the git repo, I will have a remote X, Y, Z…, where X,Y,Z… are the machine names I am working on. So far I am not sharing repos with other users on same machine. If that would be the case <machine>.<user> would probably make sense.

2. Git bare space. I call it simply “bare”. This is a remote for all other copies/mirrors that is somewhere on my local network. In my work-flow, I normally always pull/push here. I have some cron jobs, that do frequent pull push (if it is the case) with main repositories like github, or the projects origin.

3. Github and Gitorious.

4 Origin. I use this name for remote for projects that I do not keep a local copy/mirror bare repository. For these ones it really makes sense to think about the abstract remote pull/push.

git clearcase

Well the experience with umview was interesting, but too complicated :) . Working with GIT_WORK_TREE or core.worktree in .git/config, solves the problem, but you are not “inside” the directory.

***** old post follows ****

Under this name I will try to write a few articles about my design and experience in using git on top of clearcase in order to achive two things:

  • make the enterprise happy by keeping the sources in a central repository
  • make my work flow much easier by using the power of git and avoiding dead times when waiting for the slow clearcase to do operations like view setup, align with baseline, create branch, merge etc. The goal is to do all necessary development steps fast with git (including collaboration with other developers), and let the computer do the rest of required formalism against clearcase (checkout, “edit”, merge, etc)

At the enterprise where I would like to deploy this design, one of the first handicaps I faced is that the there is no place in the filesystem hierarchy before the clearcase vob, where I could place the .git folder, to map it over the cleracase views. One of the solutions came to my mind is to aske sysadmins to install the fuse module, but my request was refused for system security reasons.

Another alternative seemed to me was to run myself a virtual machine and mount there the vobs under a subfolder of a folder that contains .git. Running whatsoever virtual machine was not a choice do to my access limitations, so only UML could come in question. Uml would mean build a kernel, build a minimal filesystem with necessary tools, etc, etc.

Accidentally however I came across a simpler solution. And that is called virtual square umview. Virtual Square’s “main goal is to create an unified environment that allows virtual machines, systems and networks to communicate and interact”. Umview is a tool that is part of the project. It traps practically any system calls issued by any program that a user runs, and transform them as instructed. For example you can configure it to transform “read /mydiskspace/project.git/clearcase/vob1/module1/src/file1.c” into “read /clearcase/vob1/module1/src/file1.c”. That is to mount virtually “/mydiskspace/project.git/clearcase”  to “/clearcase” inside the virtual environment. Once having that in place you may create the mirroring .git repository in /mydiskspace/project.git. You will feed with source data from the virtually mounted /mydiskspace/project.git/clearcase/vob1/module1/src/file1.c.

How would you concretelly issue your commands? Here are they (see comments)

#>umview $SHELL #start a shell under the virtual environment

#>#you are now in the virtual environment but you have the “same” view

#>#until you start to “manipulate” it

#>um_add_service viewfs #load the module that handles virtual mounts

#>mount -t viewfs /clearcase/vob1/ /mydiskspace/project/git/clearcase

#>cd /mydiskspace/project.git

#>git init  #if this is the first git action

#>git add . #fill the repository with your files from the clearcase view

It is obvious that in the example above you are in the context of a view. You might be able to create wrapper commands around git, that will execute always in this “manipulated” environment view.

In following articles I will describe how I feed the git repo with history from clearcase. To start to work it is not strictly necessary to have the history. It is a nice convenience if you would want to access the history from git.

Note:

  • It is mentioned on the umview website that the viewfs module is in early development phase, so it might be a bit buggy. If that does not work for you, try umfuse module
  • The umview version that comes with Ubuntu Jaunty does not work well on my test machine, so I had to build one from the sources
mobifs eifs fmfs

I wrote to the fuse mailing list, about the eifs/fmfs/mobifs. I am still not sure which name fits better, for simplicity I will refer to it as mobifs. Based on the feedback from the mailing list, it became clear for me, that mobifs can be implemented with help of fuse. However implementing it in kernel will avoid double context switch, copying buffers, etc. One of the powerful features of fuse is that you may develop proof of concepts for any kind of filesystem, and the if performance is an issue, recode a part to fit into the kernel. The user space solution for mobifs will be straightforward to implement, at a later stage a kernel module may come into being.

As summary, the goal of mobifs (everything is filesystem) is to allow transparent (read/write) access to the tree structure of files that have such a tree structure under the form of traditional directory tree structure and to present the leaves in form of text, that can be edited without compromising the format expected by the original tools that handle these files.

The driving motivation behind is also to avoid to learn hundreds of command line parameters of hundreds of tools that access such files and to be able to access the data that hides behind with traditional viewers, editors or other tools.

For example an sqllite database file has such a directory structure. instead of issuing sql statements, one would be able to acces such a file like:

#>vi blogs.sqllite@sqllite/tables/blogs/rows/by/uniqueid/mobifs-eifs-story/content

in this example “blogs” is the table containing the blog entries, “by” means that I want to access data by column name, “mobifs-eifs-story” is the unique id of the blog entry, and “content” is the field containing the content.

if your blog database would be on a mysql database on a remote server, you would do sthg. similar:

#>vi blogs.db.id@mysql/tables/blogs/wors/by/uniqueid/mobifs-eifs-story/content

in this case blogs.db.id would contain necessary info how to access the database (server ip, db name, username, password)

eifs/fmfs (everything is filesystem/file mapping filesystem)

I am doing some proof of concept for a virtual filesystem extension. I wrote the following

email to the linux filesystem mailing list.

Hello mailing list members,

I am not sure about the best name, but I am sure it is sthg. very usefull. So I will call it mobifs, hope you do not mind :) I am also not sure if there is sthg. like that out there, but based on my 15 years of linux experience I never met sthg. like what I would like to talk about.
What I would like to have on my linux boxes is to access data the following way:

#>cat /dev/sda1@/partition1/etc/inittab@/defaultranlevel #usecase 0

#>cat /data/xmlfiles/apsettings.xml@xml/main/entry1/subentry/value #usecase 1

#>cat /data/xmlfiles/apsettings.xml@/main/entry1/subentry/value #note @ instead of @xml #usecase 2
#>cat /data/images/linux.img/@/partition1/etc/dhcpd/dhcpd.conf@group/next-server#note twice the @  #usecase 3

I think you got the point. It is actually not a real filesystem, but an extension to the virtual filesystem.How would it work? If the virtual filesystem subsystem of the kernel finds an @ it would redirect the call to the mobifs main module. The mobifs will check if there is any string after @ till the next /. If yes, it will look if there is any driver installed with that name. In usecase 1, it will look if there is a
driver to read xml files. If yes, it will call the driver and pass the rest of the string to it. The xml driver
will interpret the path request and output the necessary entry. If there is nothing specified like in usecase0 (note annoying tmp. mount not needed!!!)mobifs will try to guess, and if he finds a matching filetype (like the file command or mount does) it would pass the path to the driver if there is any driver registered for that. Drivers can be partition readers (do not know oficial technical name
:)), existing filesystem drivers etc. It should be possible to have kerenel drivers and userspace drivers.
It seems rather logical that for reading filesystems one would use kernelspace drivers, and
for trivial file mapping operations userspace program would fit better.

Usecas0 has twice @. This should mean double “redirection” or piping. Mobifs should help to feed the output of one driver with the input of next one, etc. In case of usecase 0, the one driver would read the filesystem on sda1, and feed the read of the driver that is able to read inittab files.

If sthg. like that would become to being, I would see its evolution as 2 main steps: first create
read only support, and then add write support. The same would happen with the drivers…

I was thinking about this the last few days, started to hack around the kernel, but before going
to the wrong direction, I wanted to ask more experienced people about what and what not to do!

for some more imagined usecases, see the PS.

rgrds,
mobi phil

P.S.

#>cat /databases/mysqldatabase/db.sql@mysql/tables/table1/2345/rowname
#>cat /databases/mysqldatabase/db.sql@mysql/tables/table1/rowname/1232
#>cat /databases/mysqldatabase/db.sql@mysql/tables/table1/rowname

#>cat /xxx/xx/xx/xx/elfimage@elf/symbols

#>cat /sourcecode/cpp/thismodule/thatfile.h@cpp/methodist
#>cat /sourcecode/cpp/thismodule/thatfile.h@cpp/includelist
#>cat /sourcecode/cpp/thismodule/thatfile.cpp@cpp/methods/thatmethod

#>cat /sourcecode/cpp/thismodule/thatfile.h@cpp/methodist

#>tail -f /var/log/messages@grep/error #maybe this is less usefull

Linux loopback device partitions

Linux kernel has powerful features, however sometimes it can be disappointing when you discover that trivial details do not work or do not work the very obvious way. Such an example is the loop back device. You may mount a file as a loop back device, but you cannot access its partition as “sub-devices” like it happens with the physical discs. There are workarounds by mounting the partition knowing the start and length of it, but that might be a bit risky.

Links and morse or what creole and markdown failed to do

It seems that all the simplified syntaxes for writing html out there fail to consider eficiency. They seem to consider blindly only usability. One of the relevant syntax example is the link. Both markdown and creoele (the syntax for wikipedia like wiki), fail to provide the simplest form for link that is [this is a link] or [http://mobiphil.com]. One of the mos frequent syntax elements on html pages are the links itself, so why should we not allocate that syntax element ([this is i link]) for that. Creole wants to force me to use double “[", that is [[this is a link]], markdown even more complicated syntax [name][http://this is a link].

Again I would preffer the creole way, but with one “[”. There were excuses that “[” is used alone. My answer is do it as it is done in computing for ages. Invent an escape character, that can be easily “\” and use “\[” if you want a “[”

supercat the super tool

I think nobody doubts that watching a colorized text file helps finding information a lot. I was using vim for that purpose, but often opening huge files is not very productive. I am using sometimes http://supercat.nosredna.net/, however I am always dissapointed that I do not file syntax config files for all output types. For example, often I examine the ouput of strace, but I never had time to write a config file for it nor I found any on internet, so … wait to load it in vim…

windows mobile api reference

http://msdn.microsoft.com/en-us/library/bb158486.aspx