[tahoe-dev] [tahoe-lafs] #329: dirnodes could cache encrypted/serialized entries for speed

Tue Jul 7 14:38:33 UTC 2009

#329: dirnodes could cache encrypted/serialized entries for speed
---------------------------+------------------------------------------------
 Reporter:  warner         |           Owner:  kevan    
     Type:  enhancement    |          Status:  new      
 Priority:  minor          |       Milestone:  undecided
Component:  code-dirnodes  |         Version:  0.8.0    
 Keywords:  dirnode        |   Launchpad_bug:           
---------------------------+------------------------------------------------

Comment(by zooko):

 Summarizing into {{{A+Bx}}} is not the way I do it.  Instead I look at a
 handful of data points, like this:

 {{{
 benchmarking <function unpack at 0x32c0970>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-
 worst:    0.09, worst:    0.10 (of      5), reps/s:     11, ave rate:
 732
 N:     256, time: best:    0.35,   2th-best:    0.35, ave:    0.36,   2th-
 worst:    0.35, worst:    0.39 (of      5), reps/s:      2, ave rate:
 714
 N:    1024, time: best:    1.51,   2th-best:    1.51, ave:    1.55,   2th-
 worst:    1.51, worst:    1.72 (of      5), reps/s:      0, ave rate:
 659
 N:    4096, time: best:   10.48,   2th-best:   10.49, ave:   10.66,   2th-
 worst:   10.53, worst:   11.29 (of      5), reps/s:      0, ave rate:
 384
 benchmarking <function pack at 0x32c0930>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-
 worst:    0.09, worst:    0.09 (of      5), reps/s:     11, ave rate:
 747
 N:     256, time: best:    0.35,   2th-best:    0.35, ave:    0.35,   2th-
 worst:    0.35, worst:    0.35 (of      5), reps/s:      2, ave rate:
 728
 N:    1024, time: best:    1.52,   2th-best:    1.53, ave:    1.53,   2th-
 worst:    1.53, worst:    1.53 (of      5), reps/s:      0, ave rate:
 670
 N:    4096, time: best:   10.48,   2th-best:   10.49, ave:   10.50,   2th-
 worst:   10.52, worst:   10.53 (of      5), reps/s:      0, ave rate:
 390
 benchmarking <function unpack_and_repack at 0x32c09b0>
 N:      64, time: best:    0.08,   2th-best:    0.09, ave:    0.09,   2th-
 worst:    0.09, worst:    0.09 (of      5), reps/s:     11, ave rate:
 744
 N:     256, time: best:    0.35,   2th-best:    0.35, ave:    0.35,   2th-
 worst:    0.35, worst:    0.35 (of      5), reps/s:      2, ave rate:
 727
 N:    1024, time: best:    1.52,   2th-best:    1.52, ave:    1.54,   2th-
 worst:    1.53, worst:    1.60 (of      5), reps/s:      0, ave rate:
 665
 N:    4096, time: best:   10.49,   2th-best:   10.50, ave:   10.51,   2th-
 worst:   10.51, worst:   10.52 (of      5), reps/s:      0, ave rate:
 390
 }}}

 This is a good example of why I do it this way: because it isn't linear!
 So any {{{A+Bx}}} summary would be off.  I already have a patch which
 fixes the observable non-linearity there.  I guess I might as well commit
 that one so that everyone can see it.  There -- [3969], entitled
 "directories: keep track of your position as you decode netstring after
 netstring from an input buffer instead of copying the trailing part | This
 makes decoding linear in the number of netstrings instead of O(N^2^).".
 After this patch, the same benchmark on the same machine says:

 {{{
 benchmarking <function unpack at 0x33429b0>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-
 worst:    0.08, worst:    0.10 (of      5), reps/s:     11, ave rate:
 750
 N:     256, time: best:    0.33,   2th-best:    0.33, ave:    0.34,   2th-
 worst:    0.33, worst:    0.37 (of      5), reps/s:      2, ave rate:
 755
 N:    1024, time: best:    1.28,   2th-best:    1.28, ave:    1.32,   2th-
 worst:    1.28, worst:    1.48 (of      5), reps/s:      0, ave rate:
 776
 N:    4096, time: best:    4.83,   2th-best:    4.83, ave:    4.97,   2th-
 worst:    4.83, worst:    5.51 (of      5), reps/s:      0, ave rate:
 824
 benchmarking <function pack at 0x3342970>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.08,   2th-
 worst:    0.08, worst:    0.09 (of      5), reps/s:     11, ave rate:
 760
 N:     256, time: best:    0.33,   2th-best:    0.33, ave:    0.33,   2th-
 worst:    0.33, worst:    0.33 (of      5), reps/s:      2, ave rate:
 767
 N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.29,   2th-
 worst:    1.29, worst:    1.29 (of      5), reps/s:      0, ave rate:
 794
 N:    4096, time: best:    4.83,   2th-best:    4.83, ave:    4.83,   2th-
 worst:    4.83, worst:    4.84 (of      5), reps/s:      0, ave rate:
 847
 benchmarking <function unpack_and_repack at 0x33429f0>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.08,   2th-
 worst:    0.08, worst:    0.09 (of      5), reps/s:     11, ave rate:
 759
 N:     256, time: best:    0.33,   2th-best:    0.33, ave:    0.33,   2th-
 worst:    0.33, worst:    0.33 (of      5), reps/s:      2, ave rate:
 767
 N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.30,   2th-
 worst:    1.29, worst:    1.36 (of      5), reps/s:      0, ave rate:
 786
 N:    4096, time: best:    4.82,   2th-best:    4.83, ave:    4.85,   2th-
 worst:    4.87, worst:    4.90 (of      5), reps/s:      0, ave rate:
 844
 }}}

 As to your question about what's the expensive part, after the
 optimization patch from Kevan plus a few that I have here in my sandbox,
 benchmarking shows:

 {{{
 benchmarking <function unpack at 0x33003f0>
 N:      64, time: best:    0.07,   2th-best:    0.07, ave:    0.08,   2th-
 worst:    0.07, worst:    0.08 (of      5), reps/s:     13, ave rate:
 849
 N:     256, time: best:    0.29,   2th-best:    0.30, ave:    0.31,   2th-
 worst:    0.32, worst:    0.32 (of      5), reps/s:      3, ave rate:
 832
 N:    1024, time: best:    1.24,   2th-best:    1.24, ave:    1.30,   2th-
 worst:    1.27, worst:    1.52 (of      5), reps/s:      0, ave rate:
 787
 N:    4096, time: best:    4.28,   2th-best:    4.29, ave:    4.40,   2th-
 worst:    4.31, worst:    4.82 (of      5), reps/s:      0, ave rate:
 931
 benchmarking <function pack at 0x33003b0>
 N:      64, time: best:    0.07,   2th-best:    0.07, ave:    0.08,   2th-
 worst:    0.08, worst:    0.08 (of      5), reps/s:     13, ave rate:
 843
 N:     256, time: best:    0.29,   2th-best:    0.29, ave:    0.30,   2th-
 worst:    0.31, worst:    0.32 (of      5), reps/s:      3, ave rate:
 840
 N:    1024, time: best:    1.14,   2th-best:    1.14, ave:    1.16,   2th-
 worst:    1.17, worst:    1.20 (of      5), reps/s:      0, ave rate:
 883
 N:    4096, time: best:    4.30,   2th-best:    4.31, ave:    4.53,   2th-
 worst:    4.64, worst:    4.99 (of      5), reps/s:      0, ave rate:
 904
 benchmarking <function unpack_and_repack at 0x3300430>
 N:      64, time: best:    0.07,   2th-best:    0.07, ave:    0.07,   2th-
 worst:    0.07, worst:    0.08 (of      5), reps/s:     13, ave rate:
 860
 N:     256, time: best:    0.29,   2th-best:    0.29, ave:    0.29,   2th-
 worst:    0.29, worst:    0.29 (of      5), reps/s:      3, ave rate:
 880
 N:    1024, time: best:    1.14,   2th-best:    1.14, ave:    1.15,   2th-
 worst:    1.14, worst:    1.21 (of      5), reps/s:      0, ave rate:
 889
 N:    4096, time: best:    4.28,   2th-best:    4.28, ave:    4.29,   2th-
 worst:    4.29, worst:    4.30 (of      5), reps/s:      0, ave rate:
 955
 }}}

 and profiling shows:

 {{{
 benchmarking <function unpack at 0x3278430>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.09,   2th-
 worst:    0.08, worst:    0.10 (of      5), reps/s:     11, ave rate:
 738
 N:     256, time: best:    0.34,   2th-best:    0.34, ave:    0.34,   2th-
 worst:    0.34, worst:    0.38 (of      5), reps/s:      2, ave rate:
 744
 N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.48,   2th-
 worst:    1.46, worst:    2.04 (of      5), reps/s:      0, ave rate:
 693
 N:    4096, time: best:    4.84,   2th-best:    4.85, ave:    4.97,   2th-
 worst:    4.87, worst:    5.45 (of      5), reps/s:      0, ave rate:
 824
 benchmarking <function pack at 0x32783f0>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.10,   2th-
 worst:    0.09, worst:    0.18 (of      5), reps/s:      9, ave rate:
 611
 N:     256, time: best:    0.33,   2th-best:    0.34, ave:    0.34,   2th-
 worst:    0.34, worst:    0.34 (of      5), reps/s:      2, ave rate:
 764
 N:    1024, time: best:    1.29,   2th-best:    1.29, ave:    1.30,   2th-
 worst:    1.30, worst:    1.30 (of      5), reps/s:      0, ave rate:
 790
 N:    4096, time: best:    4.85,   2th-best:    4.85, ave:    4.88,   2th-
 worst:    4.87, worst:    4.96 (of      5), reps/s:      0, ave rate:
 840
 benchmarking <function unpack_and_repack at 0x3278470>
 N:      64, time: best:    0.08,   2th-best:    0.08, ave:    0.08,   2th-
 worst:    0.08, worst:    0.09 (of      5), reps/s:     11, ave rate:
 754
 N:     256, time: best:    0.33,   2th-best:    0.34, ave:    0.34,   2th-
 worst:    0.34, worst:    0.34 (of      5), reps/s:      2, ave rate:
 764
 N:    1024, time: best:    1.30,   2th-best:    1.30, ave:    1.31,   2th-
 worst:    1.31, worst:    1.37 (of      5), reps/s:      0, ave rate:
 779
 N:    4096, time: best:    4.85,   2th-best:    4.87, ave:    4.89,   2th-
 worst:    4.91, worst:    4.96 (of      5), reps/s:      0, ave rate:
 837
          674783 function calls in 4.909 CPU seconds

    Ordered by: internal time
    List reduced from 75 to 32 due to restriction <32>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     36830    2.422    0.000    2.591    0.000 base32.py:60(b2a_l)
      7366    0.527    0.000    1.606    0.000 base32.py:208(a2b_l)
     54026    0.154    0.000    0.154    0.000 netstring.py:3(netstring)
     20879    0.149    0.000    0.149    0.000 hashutil.py:26(digest)
      7366    0.111    0.000    0.111    0.000 hashutil.py:163(_xor)
     36830    0.093    0.000    0.093    0.000 string.py:306(join)
      3683    0.088    0.000    0.088    0.000 encoder.py:205(iterencode)
     58928    0.080    0.000    0.080    0.000 string.py:480(translate)
         1    0.076    0.076    1.905    1.905
 dirnode.py:248(_pack_contents)
      7366    0.066    0.000    0.066    0.000
 netstring.py:7(split_netstring)
     29464    0.060    0.000    2.141    0.000 base32.py:48(b2a)
     49124    0.058    0.000    0.058    0.000 hashutil.py:23(update)
      3683    0.056    0.000    0.386    0.000
 dirnode.py:196(_encrypt_rwcap)
      7366    0.055    0.000    0.199    0.000
 hashutil.py:48(tagged_pair_hash)
     13513    0.054    0.000    0.134    0.000
 hashutil.py:38(tagged_hasher)
      3683    0.050    0.000    0.050    0.000 decoder.py:341(raw_decode)
         1    0.046    0.046    2.999    2.999
 dirnode.py:218(_unpack_contents)
      3683    0.042    0.000    0.155    0.000
 dirnode.py:207(_decrypt_rwcapdata)
      7366    0.040    0.000    1.697    0.000 base32.py:199(a2b)
    125222    0.040    0.000    0.040    0.000
 assertutil.py:30(precondition)
      3683    0.040    0.000    0.151    0.000 hashutil.py:166(hmac)
     13513    0.037    0.000    0.286    0.000 hashutil.py:43(tagged_hash)
     20879    0.035    0.000    0.035    0.000 hashutil.py:19(__init__)
 }}}

 Whoops, time for me to go to work!  I hope to measure and commit the rest
 of my optimization patches today after work.  There is one that I really
 want other people's input on before I commit it...  That one generates the
 per-entry IV with a secure hash of the writecap instead of with
 {{{os.urandom(16)}}}.

-- 
Ticket URL: <http://allmydata.org/trac/tahoe/ticket/329#comment:22>
tahoe-lafs <http://allmydata.org>
secure decentralized file storage grid