πfs - the data-free filesystem!

πfs: Never worry about data again!

πfs is a revolutionary new file system that, instead of wasting space storing your data on your hard drive, stores your data in π! You'll never run out of space again - π holds every file that could possibly exist! They said 100% compression was impossible? You're looking at it!

πfs is dead simple to build:

Firstly, you must install autoconf, automake, libfuse packages in your system. For example, if you have Debian try:

sudo apt-get install autotools-dev
sudo apt-get install automake
sudo apt-get install libfuse-dev
./autogen.sh
./configure
make
make install

πfs is dead simple to use:

πfs -o mdd=<metadata directory> <mountpoint>

where the metadata directory is where πfs should store its metadata (such as filenames or the locations of your files in π) and mountpoint is your usual filesystem mountpoint.

What does π have to do with my data?

π (or pi) is one of the most important constants in mathematics and has a variety of interesting properties (which you can read about at wikipedia)

One of the properties that π is conjectured to have is that it is normal, which is to say that its digits are all distributed evenly, with the implication that it is a disjunctive sequence, meaning that all possible finite sequences of digits will be present somewhere in it. If we consider π in base 16 (hexadecimal) , it is trivial to see that if this conjecture is true, then all possible finite files must exist within π. The first record of this observation dates back to 2001.

From here, it is a small leap to see that if π contains all possible files, why are we wasting exabytes of space storing those files, when we could just look them up in π!

Every file that could possibly exist?

That's right! Every file you've ever created, or anyone else has created or will create! Copyright infringement? It's just a few digits of π! They were always there!

But how do I look up my data in π?

As long as you know the index into π of your file and its length, its a simple task to extract the file using the Bailey–Borwein–Plouffe formula Similarly, you can use the formula to initially find the index of your file

Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

So I've looked up my bytes in π, but how do I remember where they are?

Well, you've obviously got to write them down somewhere; you could use a piece of paper, but remember all that storage space we saved by moving our data into π? Why don't we store our file locations there!?! Even better, the location of our files in π is metadata and as we all know metadata is becoming more and more important in everything we do. Doesn't it feel great to have generated so much metadata? Why waste time with old fashioned data when you can just deal with metadata, and lots of it!

Yeah, but what happens if lose my file locations?

No problem, the locations are just metadata! Your files are still there, sitting in π - they're never going away, are they?

Why is this thing so slow? It took me five minutes to store a 400 line text file!

Well, this is just an initial prototype, and don't worry, there's always Moore's law!

Where do we go from here?

There's lots of potential for the future!

  • Variable run length search and lookup!
  • Arithmetic Coding!
  • Parallelizable lookup!
  • Cloud based π lookup!
  • πfs for Hadoop!
Owner
Comments
  • PiFS installs large volumes of objectionable content and copyright violations

    PiFS installs large volumes of objectionable content and copyright violations

    While parsing the contents of PiFS I was shocked to find that it contained a large amount of objectionable and pornographic content. There are a set of indexes (which I won't reproduce here for legal reasons) point to files which I believe may cause the user to face serious liability if PiFS is discovered on their system.

    In addition, there are indexes which contain numerous copyrighted materials, as well as keys to break various DRM schemes. I believe that the use of PiFS may constitute a DMCA violation.

  • Theoretical question regarding the index

    Theoretical question regarding the index

    Is it known (or, technically, hypothesized) what is the expected value of the index in pi of a given, random sequence of n bits? After all, if the logarithm of the index (i.e number of bits in index) grows linearly in the number of bits being stored, then storing the metadata is perhaps just as expensive as storing the data itself.

  • Pointer cast errors during make

    Pointer cast errors during make

    gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I.. -D_FILE_OFFSET_BITS=64 -I/usr/local/include/fuse   -Wall -Werror -Wextra -Wno-unused-parameter    -g -O2 -MT πfs.o -MD -MP -MF .deps/πfs.Tpo -c -o πfs.o πfs.c
    πfs.c: In function 'pifs_opendir':
    πfs.c:258:14: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
    πfs.c: In function 'pifs_readdir':
    πfs.c:265:14: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    πfs.c: In function 'pifs_releasedir':
    πfs.c:290:22: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    πfs.c: In function 'pifs_fsyncdir':
    πfs.c:297:18: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    cc1: all warnings being treated as errors
    
  • Broken link in Readme.md

    Broken link in Readme.md

    In the section: "So I've looked up my bytes in π, but how do I remember where they are?" - the "we all know" link seems to be broken,

  • pifs_write and pifs_read

    pifs_write and pifs_read

    during writing

    for (index = 0; index < SHRT_MAX; index++) {
      if (get_byte(index) == *buf) {
        break;
      }
    }
    ret = write(info->fh, &index, sizeof index);
    

    if you are not able to find index that makes "(get_byte(index) == *buf" to true, index will be SHRT_MAX or 0...

    So when you will decode it as

    *buf = (char) get_byte(index);
    

    you will receive wrong value.

  • install troubles

    install troubles

    So after seeing in issue 8 that I must install autogen and fuse from sourceforge, I did so.

    But I still got...

    ./autogen.sh: 9: ./autogen.sh: autoreconf: not found
    

    In the end of day, I got this fixed by simply...

    sudo apt-get install autoconf
    

    ...and reinstalling fuse from the buntu repos as the SF one is old. I knew something was fishy considering Linux Mint 15 already has these two packages anyway.

    I suggest updating INSTALL and README with something like this, seeing as how the fix is so simple (it is 2 years old...).

    Thank you for reading. Jack.

  • How does the metadata work, and how can I retrieve my data?

    How does the metadata work, and how can I retrieve my data?

    Is the whole file the index? Or is it split up into different areas?

    I read that this is indexing each individual byte, but does it store the length of your file as well?

  • Shorthand encoding for positions

    Shorthand encoding for positions

    Say we wanted to encode "123". The first occurrence of this is Pi is at position 1924. However, a shorter way to encode this would be to store "123", which is shorthand for "the byte at the position in Pi where the byte 123 can be found". This stores the same data as storing "1924", but in a shorter form. This also skips the costly Pi lookup step, drastically improving performance.

    For example, the byte sequence:

    FA 01 7A D7 12 0B

    would be encoded as:

    FA 01 7A D7 12 0B

    A function to convert between plain bytes and shorthand Pi offsets could look like this pseudocode:

    char encode(char original) {
      return byteAtPiPosition(findByteInPi(original)):
    }
    
    char decode(char encoded) {
      return byteAtPiPosition(findByteInPi(encoded)):
    }
    

    However, we can skip some steps here, for an optimised version:

    char encode(char original) {
      return original:
    }
    
    char decode(char encoded) {
      return encoded:
    }
    

    This would bring many of the advantages of traditional filesystems to PiFS, such as high performance, and reduces the size of the metadata.

    As a bonus, this encoding is fully compatible with traditional filesystem drivers, due to the output metadata being readable as if it were the original data. Therefore, you don't even have to reformat your disk to use this new implementation of PiFS!

    But wait, it gets even better! All we have to do to add support for PiFS to existing drivers, such as EXT4 and NTFS, is to inject the two encode and decode functions into wherever the drivers write and read to the disk. So, a read like this:

    int var = read_from_disk(position);
    

    will have to be changed to this:

    int var = decode(read_from_disk(position));
    

    If we mark the functions with always_inline, or allow the compiler to automatically inline the functions, then it will get converted to this:

    int var = read_from_disk(position);
    

    You may notice that this is completely identical to the original code! This means that we can skip the step of modifying, recompiling and replacing the code entirely!

    Here is a simple 0-step tutorial to switch to from a traditional filesystem to this new version of PiFS:

    And you're done!

    Also, since you do not need to modify the code, this even works on proprietary drivers like the Windows NTFS one. In fact, you have already been using it for as long as you're been using a computer, without even knowing it. Amazing!

  • Metadata error

    Metadata error

    When I try to make a metadata directory, it says that the directory or file doesn't exist. When I try to and the directory exists, there's an error with fuse. Fuse said it could be fixed with a 'nonempty' mount option, but I can't figure it out.

  • Is there an automated tool to decrypt the metadata?

    Is there an automated tool to decrypt the metadata?

    So unfortunately, I lost main main data drive, but the good news is I have it all backed up with πfs. Unfortunately I am not math-savvy enough to implement the Bailey–Borwein–Plouffe formula to restore the data myself. Surely someone must have made a tool by now?

Related tags
An implementation of C++17 std::filesystem for C++11 /C++14/C++17/C++20 on Windows, macOS, Linux and FreeBSD.

Filesystem Motivation Why the namespace GHC? Platforms Tests Usage Downloads Using it as Single-File-Header Using it as Forwarding-/Implementation-Hea

Jun 17, 2022
P1031 low level file i/o and filesystem library for the C++ standard

This is the post-peer-review LLFIO v2 rewrite. You can view its documentation at https://ned14.github.io/llfio/ master branch develop branch CMake das

Jun 17, 2022
The MHS Filesystem- A very simple linked-list based file system designed for recoverability and low data redundancy. Public domain filesystem (Version 1)

MHS Filesystem The MHS filesystem. Features: can be modified to work with any size of disk or sector, even non powers of two! Allocation bitmap stored

Feb 10, 2022
✔️The smallest header-only GUI library(4 KLOC) for all platforms
✔️The smallest header-only GUI library(4 KLOC) for all platforms

Welcome to GUI-lite The smallest header-only GUI library (4 KLOC) for all platforms. 中文 Lightweight ✂️ Small: 4,000+ lines of C++ code, zero dependenc

Jun 17, 2022
A FAT filesystem with SPI driver for SD card on Raspberry Pi Pico
A FAT filesystem with SPI driver for SD card on Raspberry Pi Pico

no-OS-FatFS-SD-SPI-RPi-Pico Simple library for SD Cards on the Pico At the heart of this library is ChaN's FatFs - Generic FAT Filesystem Module. It a

Jun 16, 2022
An implementation of C++17 std::filesystem for C++11 /C++14/C++17/C++20 on Windows, macOS, Linux and FreeBSD.

Filesystem Motivation Why the namespace GHC? Platforms Tests Usage Downloads Using it as Single-File-Header Using it as Forwarding-/Implementation-Hea

Jun 17, 2022
P1031 low level file i/o and filesystem library for the C++ standard

This is the post-peer-review LLFIO v2 rewrite. You can view its documentation at https://ned14.github.io/llfio/ master branch develop branch CMake das

Jun 17, 2022
Allows a programmer to save/load configurations to/from filesystem in a structured way

fsconfig Allows a programmer to save/load configurations to/from filesystem in a structured way. Groups are mapped into directories, fields are mapped

Dec 27, 2021
Tiny implementation of the GNU/Linux CGroupFS (sans resource controllers) as a PUFFS or FUSE filesystem for BSD platforms

CGrpFS CGrpFS is a tiny implementation of the GNU/Linux CGroup filesystem for BSD platforms. It takes the form of a either a PUFFS or FUSE filesystem,

Jan 10, 2022
mergerfs - a featureful union filesystem

% mergerfs(1) mergerfs user manual % Antonio SJ Musumeci [email protected] % 2021-10-25 NAME mergerfs - a featureful union filesystem SYNOPSIS merge

Jun 15, 2022
Official implementation of the tabfs-28 filesystem as an reuseable library

libtabfs This projects aims to implement TabFs-28 as an minimal dependency library so various other projects can use it as they need. License libtabfs

May 17, 2022
PlotFS is a fuse filesystem for efficient storage of Chia plot files.

PlotFS PlotFS is a fuse filesystem for efficient storage of Chia plot files. PlotFS is not a traditional filesystem. It is mounted read only for farmi

Jun 20, 2022
Proof of concept userspace filesystem that executes filenames as shell commands and makes the result accessible though reading the file.

ExecFS Proof of concept userspace filesystem that executes filenames as shell commands and makes the result accessible though reading the file. $ ./ex

Apr 14, 2022
Vizzu is a free, open-source Javascript/C++ library for animated data visualizations and data stories.
Vizzu is a free, open-source Javascript/C++ library for animated data visualizations and data stories.

Vizzu is a free, open-source Javascript/C++ library utilizing a generic dataviz engine that generates many types of charts and seamlessly animates between them

Jun 24, 2022
A bounded single-producer single-consumer wait-free and lock-free queue written in C++11
A bounded single-producer single-consumer wait-free and lock-free queue written in C++11

SPSCQueue.h A single producer single consumer wait-free and lock-free fixed size queue written in C++11. Example SPSCQueue<int> q(2); auto t = std::th

Jun 21, 2022
Forkpool - A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20

riften::Forkpool A bleeding-edge, lock-free, wait-free, continuation-stealing scheduler for C++20. This project uses C++20's coroutines to implement c

Jun 20, 2022
Awesome-lockfree - A collection of resources on wait-free and lock-free programming

Awesome Lock-Free A collection of resources on wait-free and lock-free programming. ?? ?? ?? Even better resource from MattPD: C++ links: atomics, loc

Jun 14, 2022
A collection of hash tables for parallel programming, including lock-free, wait-free tables.

Hatrack Hash tables for parallel programming This project consisists of fast hash tables suitable for parallel programming, including multiple lock-fr

Jun 10, 2022
A FREE Windows C development course where we will learn the Win32API and reverse engineer each step utilizing IDA Free in both an x86 and x64 environment.
A FREE Windows C development course where we will learn the Win32API and reverse engineer each step utilizing IDA Free in both an x86 and x64 environment.

FREE Reverse Engineering Self-Study Course HERE Hacking Windows The book and code repo for the FREE Hacking Windows book by Kevin Thomas. FREE Book Do

Jun 17, 2022
This is a tool for software engineers to view,record and analyse data(sensor data and module data) In the process of software development.
This is a tool for software engineers to view,record and analyse data(sensor data and module data) In the process of software development.

![Contributors][Huang Jianyu] Statement 由于工具源码在网上公开,除使用部分开源项目代码外,其余代码均来自我个人,工具本身不包含公司的知识产权,所有与公司有关的内容均从软件包中移除,软件发布遵循Apache协议,任何人均可下载进行修改使用,如使用过程中出现任何问

May 5, 2022