[tahoe-dev] GSoC Share Rebalancing and Repair Proposal

Mon Apr 22 00:17:35 UTC 2013

Hi everyone, over the last few days I have been working on a proposal for
GSoC to address share rebalancing and repair. I've copied
the proposal below (with some of my personal contact information redacted
:] ). If you see something wrong in my proposal, have any questions, or
have any suggestions, please let me know.

Thanks!
Mark Berger

Organization: Tahoe-LAFS
=============

Student Info:
=============
Mark J. Berger
Time Zone: Pacific
Time Zone during GSoC: Eastern
IRC Handle: Mark_B at irc.freenode.net
Github: markberger
Email: mjberger [at] stanford.edu

University Info:
================
University: Stanford University
Major: Computer Science
Current Year: Freshman
Expected Graduation: June 2016
Degree: BS

About Me:
=========

I'm a freshman at Stanford University studying computer science. Right now
I am finishing up my core requirements and will be pursuing the artificial
intelligence track or the systems track within the major. My interests lie
in machine learning, large distributed systems, and web applications.

I began programming during an internship at Four Directions Productions in
2011, where I learned how to use Python in conjunction with Maya. The
majority of my college coursework has been in C or C++ on linux with a
little Java. This has made me familiar with tools such as GCC, GDB and
Valgrind.

While I have never contributed to an open source project before, I am
making an effort to learn about Tahoe-LAFS and become familiar with its
code base and community. Using a virtual machine, I've successfully
installed Tahoe on an Ubuntu server and connected to the Public Test Grid.
I've also subscribed to the mailing list, connected to the IRC channel, and
successfully pulled the code off of Github. While I know my lack of
experience in open source is a short coming, I am completely dedicated to
using GSoC's Community Bonding Period to overcome any obstacles before the
official coding period begins.

Project Title: Share Rebalancing and Repair in Tahoe-LAFS
=========================================================

Abstract:
=========

The "servers of happiness" algorithm has improved Tahoe's ability to
maximize redundancy by ensuring a given subset of all shares are placed on
distinct nodes. However, this processes is not used to upload mutable
files, instead opting for the old "shares of happiness" algorithm, which
has well documented downsides. Additionally, file repair does not
necessarily  redistribute files to new servers when nodes have been added.
This creates issues in terms of redundancy and long term server health.
Implementing proper file rebalancing for all file types during file upload,
modification, and repair will enhance the reliability of the Tahoe system
and take full advantage of erasure encoding.

Deliverables:
=============

1. Mutable files automatically distribute over nodes according to the
"servers of happiness" algorithm whenever uploaded, modified, or repaired
(ticket #232).

2. Repair will redistribute files according to "servers of happiness"
algorithm and only renew the appropriate leases (ticket #699).

3. Documentation changed to correctly reflect the new feature set

4. Create a test suite to be used on a network of virtual machines in order
to test file rebalancing.

Time Line:
==========

Note: I would like to have a code review session with my mentor on a weekly
basis at minimum, especially at the beginning of the program. Those
sessions are
left off the time line to avoid redundancy

May 27th - June 17th (Community Bonding):
-----------------------------------------

- Remain available via IRC and email
- Closely follow the development email list
- Isolate and understand the classes which pertain to the current
  implementations of the servers of happiness algorithm to determine which
  parts can be reused.
- Discuss with my mentor(s) and the community to determine whether code
  should be refactored to apply to both immutable and mutable files or if
  the two need to remain distinct for design reasons
- Discuss with my mentor(s) and the community the best way to go about
testing
  file rebalancing.

Note: June 3rd through the 14th is my final exams period and I will be
packing
 so that I can go home to Upstate NY. Since I will be very busy during this
 time, not all of the above may be accomplished in time to start coding.
 My classes do not resume until the end of September 23rd, so I can push my
 time line back a week or two if need be.

Jun 17th - 28th
---------------
- Implement "servers of happiness" for mutable files during the initial
  file upload and file modification

Jul 1st - 12th
--------------
- Throughly document code
- Write test scripts for larger networks
- Test code using virtual machines or predetermined test scheme from CBP

Jul 15th - 19th
---------------
- Clean up test scripts
- Throughly document test scripts
- Fix minor bugs
- Continue to consider and test edge cases

Note: "Servers of happiness" for mutable files should be in a mergable state
      with tests before the midway point on July 29th.

Jul 22nd - Aug 2
----------------

- Modify repair code to use the "server of happiness" algorithm for both
  immutable and mutable files. This should be accomplished by utilizing the
  existing code from the initial upload process

- Edit mechanism for lease renewal to ensure minimal amount of lease
  renewal is done during rebalancing

Aug 5th - 16th
--------------

- Throughly document code
- Extend tests for mutable files to encompass rebalancing during file repair

Aug 19th - 23rd
---------------

- Clean up test scripts
- Throughly document test scripts
- Fix minor bugs
- Continue to consider and test edge cases

Aug 26th - 30th
---------------

- Change documentation to reflect additional features

The weeks of September 1st and 8th are left blank for flexibility.

Possible projects if the above are accomplished ahead of schedule:
==================================================================

 - Detect if disk(s) on a server are in a near fail state. If the disk(s)
   are close to failing, notify the administrator, and slowly begin
   redistributing shares to the other storage nodes (tickets #481 and #864).

 - Let the user specify a maximum storage capacity for a given storage node
   based on folder size instead of free space left on the machine.

 - Tahoe backend for Google Drive (ticket #1831).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tahoe-lafs.org/pipermail/tahoe-dev/attachments/20130421/180ef7a4/attachment.html>