Posts Tagged “python”

For why and reference material, see Part 1.

So first off, the d and e vectors. Recall that multiplication of terms is accomplished by an AND gate since the operands are bits, and addition is accomplished by an XOR gate. I’ll go over the construction of d, but the same steps apply for e as well.

def gen_d(m):
    a = ['a[%i]' % i for i in xrange(m)]
    a.reverse()
    a = a + [False for i in xrange(m-1)]
    L = []
    for i in xrange(m):
        L.append(a[-m:])
        a.pop()

    b = ['b[%i]' % i for i in xrange(m)]
    d = [' ^ '.join([s for s in map(cond_and, L[i], b) if s != ''])
         for i in xrange(m)]

    return ['d%i = ' % i + d[i] for i in xrange(m)]

The only parameter is m, which is the m in GF(2m), so for GF(256), m = 8.

First, I need to construct the L matrix, which is m by m. I use something similar to a sliding window to construct the rows of L, popping the trailing element out on each iteration to shift the window to the left.

What’s interesting to note is that instead of padding the row with zeros, like in the paper, I used a boolean. I could have also used an empty string, but it’s better to play it safe and use an explicit boolean. On the other hand, a non-empty string evaluates to “True.”

To do the matrix multiplication, I use map on a row of L and the vector b. The function that is applied to pairs of elements is below:

def cond_and(x, y):
    if x:
        return '(%s and %s)' % (x, y)
    return ''

To make the equation strings a bit cleaner, if the row element is False (i.e. 0) then an empty string is returned. Otherwise, it constructs the product term of the row element and the vector element. The string builder filters out those empty strings when xor-ing the product terms together.

Lastly, the appropriate di term is placed at the beginning of each equation, and a list is returned. To print out the entire list, try print '\n'.join(gen_d(8))

To generate the terms in e, the exact same steps occur, except that the U matrix is m-1 by m, and so the number of equations is m-1.

def gen_e(m):
    a = ['a[%i]' % i for i in xrange(m)]
    a.reverse()
    a = [False for i in xrange(m-1)] + a[:-1]

    U = []
    for i in xrange(m):
        U.append(a[-m:])
        a.pop()

    b = ['b[%i]' % i for i in xrange(m)]
    e = [' ^ '.join([s for s in map(cond_and, U[i], b) if s != ''])
         for i in xrange(m-1)]

    return ['e%i = ' % i + e[i] for i in xrange(m-1)]

When you print out the terms for d and e one after the other, you see how they could interlock, hence the very regular layout of the multiplier.

Code to generate QT is below.

def Q_transpose(prim):
    mul = gf_256.mul_table(prim)
    m = 8
    alphas = [1]
    for i in xrange(1, 15):
        alphas.append(mul[2][alphas[i-1]])

    combinations = [to_bits(i) for i in xrange(256)]
    x = [0 for i in xrange(256)]
    for i in xrange(256):
        k = reduce(gf_add, map(multiply, combinations[i],
                               alphas[:m]))
        x[k] = combinations[i]

    Q = [0 for i in xrange(7)]
    for i, j in zip(xrange(7), alphas[m:]):
        Q[i] = x[j]

    Q = numpy.array(Q)
    Q_t = Q.transpose()
    return Q_t

Right off the bat you see that I generate a finite field multiplication look-up table based on the primitive polynomial (i.e. 0×1D). In the interests of keeping things concise, I’m not going to give the code to generate such a table, but you may find Wikipedia useful for coding that stuff yourself.

I’m too lazy to write out equation (5) so look it up yourself. But the important points is that Q can only have binary elements, so multiplication is really just an on/off switching, and since we’re in the finite field, addition is XOR.

I never figured out why people always use “primitive element” and “α” when they could just say “2″, because that’s what it really is in all practical finite fields. So I construct a vector containing the powers of 2 within the finite field, by repeated multiplication using the look-up table.

Since there are only 256 possible linear combinations of α0 … α7, I use a brute force approach, converting 0 … 255 into boolean vectors.

def to_bits(byte):
    ret = []
    for i in xrange(8):
        ret.append(bool(byte & 1))
        byte = byte >> 1
    return ret

Now in Python, you can mix types to a certain extent. I’m using the property that a boolean can multiply with an integer, with the result being either the integer itself (if the boolean was True) or 0 (if the boolean was False). I’m not going to post the code to the multiply function.

The list of products are reduced, i.e. summed using XOR. I’m not going to post the code to the gf_add function.

In this fashion, I compute the result of all possible linear combinations, and pick the ones that equal to α8 … α14. numpy makes it easy to convert a list of lists into an array and then transpose it. If you use Python on Linux, chances are good that you already have numpy, or even scipy.

To implement c = d + QTe, see below:

def gen_c(Q_t):
    e = ['e%i' % i for i in xrange(7)]
    c = []
    for i in xrange(8):
        t = ' ^ '.join([s for s in map(multiply, Q_t[i], e) if s != ''])
        s = 'c%i = d%i ^ %s' % (i, i, t)
        c.append(s)
    return c

I use the exact same multiply function without regard for typing. Python returns an empty string if the boolean operand is False, and the string itself if the boolean operand is True, so a simple filter gets rid of the terms that are not needed.

To verify, you can copy and paste the code blocks for d, e, and c into a function that takes a, b as arguments, and returns an integer that the resulting c elements represent. Then you can repeatedly call that function with all possible inputs and verify the result against the multiplication look-up table. I leave all that as an exercise for the reader.

With a bit of tweaking to the above code, or just find/replace, you can use the code blocks as the body of a VHDL entity or Verilog module. That is also an exercise for those inclined. Of course, for multiplication by a constant the logic could be simplified beforehand, but I would be perfectly fine with the synthesizer optimizing that stuff out.

Tags: , , ,

Comments No Comments »

So for various reasons, it turns out that I don’t really need Reed-Solomon codes for what I’ll be doing, but I still have more than a passing interest in the decoding side of things, if only for hobby reasons.

You can make the case that the fundamental building block for a Reed-Solomon decoder is the finite field multiplier. Addition in GF(2m) is easy — just xor — but multiplication is another story entirely. The multiplier itself is dependent upon the field primitive polynomial, so that’s why you don’t see embedded FFM blocks on an FPGA, or even a general purpose processor for that matter.

That may change in the future when Intel introduces AVX with its cryptography accelerating instructions, but even then, from what little I can understand, it doesn’t sound like a single instruction does FFM on a register; you may still have to do the modulo reduction yourself.

Anyway, I’m here today to talk about hardware FFM’s, specifically on 8-bit operands. Google has a tough time finding information on how to design a FFM, although I suppose it all depends on how you ask it. You can find some implementations, like on OpenCores, but I’m not that enthusiastic about having to register for a once totally open site (you can try your luck with BugMeNot though), plus implementations rarely ever convey how they were created.

I finally found a fairly friendly method, in a paper by A. Reyhani-Masoleh and M. A. Hasan. See Low Complexity Bit Parallel Polynomial Basis Multiplication over GF(2m).

Note that “FFM”, “finite field”, or “Galois” are nowhere to be found in the title. I suppose I am unfamiliar with the nomenclature, which is what made searching for information more difficult.

I think it should also be noted that IEEE papers are generally not freely available, so if it weren’t for Dr. Reyhani-Masoleh’s generosity, you’d have to access the paper through your own, or university’s, subscription to IEEE Xplore.

For the ASIC layout folks, the paper details a very regular structure for the multiplier, so bonus.

From a design standpoint, there are three variables. Two of them, the d and e vectors, depend only on the bit-width so they are constant regardless of the primitive polynomial. How they are combined is determined by the transpose of the Q matrix, and to get at Q you need to solve equation (5).

The elements of Q are constrained to be either 1 or 0, so multiplication is really an on/off operation, and addition is xor. At least for small values of m (the m in GF(2m)) you can brute force the solution. Also, when working out d and e from the Toeplitz matrices, multiplication is the and operation.

All vectors are indexed starting from 0, i.e. e = [e0, e1, ... , em-2]. Bit-level multiplication is accomplished with an and gate.

The algorithm lends itself well to code generation, so I wrote my own Python functions to generate and verify the basic bit-level equations. With a bit of find/replace, it’s easy to just port those over to something like VHDL. Here’s a link for a 0×1D multiplier in GF(256), i.e. P(x) = x8 + x4 + x3 + x2 + 1. I think it should synthesize, but without running it through Quartus or ISE, I can’t say for sure. Until I get my hands on an FPGA development board, I’m not really motivated to install either of those packages.

In Part 2, I’ll post my Python code, which I’m more confident in. For now, enjoy the multiplier, whose polynomial is used in ATSC, WiMAX, and DVB Reed-Solomon codes.

Tags: , , , , ,

Comments 1 Comment »

In no particular order:

  • string.maketrans is good for implementing simple substitution ciphers.
  • A brief introduction to regular expressions, and generally a variety of built-in modules.
  • Recognizing the preamble bytes for bzip, zlib, PNG, and JPEG.
  • You will love the Python Imaging Library.
  • List operations, list comprehensions (but not necessarily generator expressions), the element selection parameter (i.e. foo[::3] selects every third element, foo[::-1] returns a reversed list).
  • File I/O, StringIO. And somewhere along the line, manipulating byte streams as strings and such.

In terms of game specific lessons, which are really just tips:

  • You can embed arbitrary data into an image.
  • Conversely, you can embed an image in just about anything. When in doubt, try to turn the data in front of you into an image. If you have an image, particularly if it’s large, look for an image that’s embedded.
  • PNG and GIF can compress losslessly, but JPEG cannot. While not always a rule, chances are good that when encountering a PNG or GIF, you should do something with that image. See above, but again it’s not always the rule.
  • Keep logs of your steps. They sometimes contain information as well.
  • You should load puzzle pages with Python, or at least have Firebug running. Preferably both, as they reveal things that don’t render in the browser. No, I’m not talking about HTML comments although there are those too.
  • Sometimes, Python is not the best answer. Shocker, but I can’t be bothered to use smtplib when GMail will do.
  • But other times, python is the answer.
  • On that track, if the problem in front of you is non-trivial, and someone has solved it and packaged it into an easy to use web application, just use the web app. I could have slogged away at a solution over a couple of weeks, but I just wanted to get to the last level.

That’s about it. Google can give you solutions for levels up to 24, but resist the temptation if possible. At the very least, try to write and run your own code based on the solution scripts that you see. This is supposed to be an educational experience, and there’s no prize at the end, as stated on the challenge site. You’ll learn absolutely nothing by just entering in the solution URL’s, but I am also a strong believer in the idea that you cannot learn from a puzzle by osmosis. At least books tell you stuff, but a puzzle only tells you something after you’ve solved it.

If you’re so inclined to attempt the Python Challenge and get stuck, you can drop me a comment here, or just sign up and post for help on the hint forums, and I can send you a private message.

Tags: , ,

Comments No Comments »

Ninja edit: I could have sworn that I tried treating the data as an image before, but I suppose I had the dimensions reversed because it didn’t work out. On my way to figuring this level out. Details later.

Been hitting my head against level 30 of the Python Challenge. The hint forum seems quite dead — it has been a couple years since a new puzzle came out — and a lot of sites external to the solutions wiki fall short of level 30, so I figured that asking here is about as good as asking anywhere else.

Anyway, many of the puzzles don’t require specific use of Python, and this one of those, I think. Check out the CSV file. It’s not quite a perfect CSV because there’s a number missing from the last row, but treat it more as a list of floating point numbers. From the available hints, the starting point is to find the length of the list (7367) and obtain its factors (53 and 139, which are prime).

Now, we are supposed to be look at the data in a “different way” with the factors obtained. “Formula” is a term that’s thrown around a lot, and I just can’t see one. I mean, how might 0.82207, 0.91181, etc. be related to 53 and 139? One thing to consider is that the decimal is misleading, in which case how might 82207, 91181, etc. be related to 53 and 139?

I’d be interested in any thoughts on this.

Tags: ,

Comments 6 Comments »

This is a follow-on from my previous post. I mentioned how I wrote my own app after giving up on Yahoo Pipes’ Unicode quirks. Well, I think I found the solution. It’s partly Yahoo’s fault, and partly me not reading some documentation.

I consider myself a programming novice in general, not just in Python, so bear with me.

On Yahoo’s side, it has inconsistent support for UTF-8. On the pipe output there is a mix of proper UTF-8 rendered characters, as well as wrong characters stemming from each byte of their multi-byte sequences being placed as-is.

Here’s a common example that I came across. Consider the byte sequence 0xE28099, typically represented in Python as “\xe2\x80\x99″. When interpreted as a UTF-8 character, it winds up as ‘U+2019′ or in Python as u\’u2019′. The u prefix indicates a unicode string, and the escape sequence basically says Unicode character 2019. If you go to print this out, it is the right single quote, i.e. an apostrophe.

But rather than rendering an apostrophe in the pipe output, you get “\xe2\x80\x99″ but in a Unicode string. So all over the place I see things like, “I\xe2\x80\x99m” and “haven\xe2\x80\x99t”. Raw UTF-8 embedded in UTF-8. Fun stuff. To my knowledge, there’s no operator within Pipes that could fix something like this, although there might be a regex that could do the trick. I don’t know the first thing about regular expressions though.

Not too long before, while looking for articles on Google App Engine, I came across a post on using GAE for the Web Service operator. Free hosting for Pipes filters? Sign me up!

Yahoo Pipes posts its data to the address in the Web Service operator, in the form of a UTF-8 encoded JSON string. It expects either a JSON string or RSS/XML document in return, but it’s probably more convenient to return JSON since you’re working with it. Plus, I did try the RSS/XML option once but Pipes output just showed a zero. I didn’t feel like pursuing the problem.

So here is the handler that services the Yahoo Pipes request:

from django.utils import simplejson

class PipesScrubber(webapp.RequestHandler):
  def post(self):
    data = self.request.get('data')
    cleaned = tidy_unicode(data)
    posts = simplejson.loads(cleaned)['items']
    self.response.headers['Content-Type'] = 'application/json'
    simplejson.dump(posts, self.response.out, ensure_ascii=False)

The simplejson library is bundled with django which is bundled with GAE so you don’t have to upload your own. What I neglected to do the first time I tackled this problem was to set that ensure_ascii argument to False. Looking back, it would be the obvious thing to do because if I got UTF-8 JSON, I should return UTF-8 JSON.

But by default, simplejson generates ASCII-encoded strings, while escaping all HTML and Unicode characters. I was baffled by this for the longest time, because the solution seemed sound but the Pipes output had just as strange looking stuff because of all the escaping. Turns out I should have read the simplejson documentation a bit more thoroughly.

Here is the tidy_unicode function that does all of the heavy lifting:

import StringIO
import array

def tidy_unicode(raw):
  ret = StringIO.StringIO()
  lim = len(raw)
  i = 0
  while i < lim:
    try:
      val = ord(raw[i])
      if val < 0x80 or val > 0xF4:
        ret.write(raw[i])
        i += 1
      elif 0xC0 <= val <= 0xDF:
        s = array.array('B', [ord(c) for c in raw[i:i+2]]).tostring()
        ret.write(s.decode('utf_8'))
        i += 2
      elif 0xE0 <= val <= 0xEF:
        s = array.array('B', [ord(c) for c in raw[i:i+3]]).tostring()
        ret.write(s.decode('utf_8'))
        i += 3
      else:
        s = array.array('B', [ord(c) for c in raw[i:i+4]]).tostring()
        ret.write(s.decode('utf_8'))
        i += 4
    except UnicodeDecodeError:
      raise NameError(array.array('B', [ord(c) for c in raw[i:i+4]]))

  return ret.getvalue()

The try/except block is left-over debug code when I was hitting my head against my desk. Doesn’t hurt to leave it in, as the JSON strings aren’t super long so performance isn’t that much of a concern anyway. Which is good, because this code is quite horrible, like trying to write C in Python and expecting the same performance. But I couldn’t think of any other way (always open to suggestions though). After all, the problem is that raw UTF-8 strings are being embedded in an otherwise fine UTF-8 string.

As such, you can’t just do something like utf_8_str.decode(’utf_8′) because it would treat each “raw” byte as ASCII, find that the leading byte is greater than 127, and then blow up. Instead, the byte sequences have to be isolated and converted to a regular string which can be interpreted as UTF-8. Complicating matters a tad is that byte strings are variable length, so range checks have to be performed to calculate the length.

One thing that could be changed right off the bat, besides removing the try/except block, is the use of generator expressions instead of list comprehensions when doing the byte sequence conversion. Might be a bit more memory efficient, but on GAE memory is dirt cheap, and this app isn’t using gobs of it anyway.

There you have it. Hope it was useful, or educational at least.

Tags: , , ,

Comments No Comments »

I think I’ve found a bug, as the title suggests. Probably none of you reading this know about AvsP, so this is more of a reminder to myself once my probationary period on Doom9 is over and I can make a post.

This bug is actually poorly correlated to a side project that I’m working on at the moment, but would eventually become a show stopper. The offending line is line 327 in avisynth.py, in the GetHeight function of a class called PVideoFrame:

if self.p.contents.pitchUV!=0 and (plane==PLANAR_U or PLANAR_V):

Here’s a bit of background. Instead of RGB for a video colour scheme, frames are encoded using one luminance and two chrominance values. While more ethereal and not as intuitive as RGB, it is more efficient and is the foundation of the three major analog TV standards. PLANAR_U and PLANAR_V denote the chrominance values, so it makes sense to keep chrominance operations on U and V separate from the luminance channel, which is PLANAR_Y.

But with the above line, operations that should be exclusive to U and V will are opened up to Y as well! Why?

In Python, any non-zero value evaluates to “true” in an if statement, much like in C. Now consider the if statement. As long as pitchUV is non-zero, the first condition is true. But notice that the second half is always true. Why? Because PLANAR_V is non-zero, so it’s true. Even if plane was not equal to PLANAR_U, we would have “false or true”, which is always true, so the bracketed condition is always true overall.

The solution, then would be to do something like this:

if self.p.contents.pitchUV!=0 and (plane==PLANAR_U or plane==PLANAR_V):

Now the second sub-condition won’t evaluate to true all the time. But better yet, we should step back and ask, “What is it do we want to accomplish with this line?” We want to determine if plane is either PLANAR_U or PLANAR_V, but at a more abstract level, we want to know if plane belongs to a set whose elements are PLANAR_U and PLANAR_V. Python supports just such a construct:

if self.p.contents.pitchUV!=0 and (plane in (PLANAR_U, PLANAR_V)):

Tags: ,

Comments No Comments »