Wednesday, December 10, 2008

OCaml objects wrap C++

Amended SRILM generateSentence with prepopulated context, fixed end-of-sentence termination, and wrapped lmclient into a new lmclass. Still need to implement Mauricio advice on Gc.properly -- compiler rejects things like

Gc.finalise self#destroy, or Gc.finalise ignore (Lmclient.destroy handle) -- during runtime!

The wrapped system now works and the new method Lmclient.complete_sentence maxwords context properly generates exact-size completions.

I also got the book “Q for Mortals” and continued to evaluate J; also got the educational APL from Dyalog, but it’s only for Windows.

Ah, and my C bindings handle is now an abstract

type handle

-- also thanks to Mauricio. I added things like null () value and is_null, returning a null value from C and checking it there, similarly to limited private types in Ada.

Tuesday, December 9, 2008

OCaml bindings

Continued making object-wrapping bindings for OCaml representing SRILM C++ classes. The caml-list advice from Mauricio and Filipp Monnier is cool.

In search of OCaml bindings, found PLplot, tried to install it in macports -- doesn’t work with gtk-osx +no_x11, discussed that on macports list and plplot-general. Basiclaly, in svn checkout of plplot, you have to say,

cmake cmake

-- the latter is a directory -- and it will generate enough to run

ccmake .

-- which will provide graphical displays.

J, kx/Q, APL

Stumbled upon kx/Q in a Stevey’s rant on “a portrait of a n00b,” containing the silliest quote about “OCaml, Haskell, and their ilk.” That brought me back to kx/Q perusal and a tutorial by Boroff, and also J primer -- which is a treat to launch and follow. Ken Iverson clearly had written it. I wish I can program new mobile platforms at 80 as he did.

Quote:
A word is a group of characters from the alphabet that has a meaning.

LM novelties

Found RandLM, based on Bloom filters -- championed by Broder too. I need LM generation for sentence completion -- this can be done rather easily with SRILM, but could be cool with other methods, e.g. associative LMs. Gathered another dozen papers on LMs, including WSME by Roni Rosenfeld, and others.

New NLP findings with OCaml

HunPoS is a part of speech tagger in OCaml, implementing HMM with suffix recognition. This is a superb foundation for sequence HMM! Has a fast C lib. Didn’t build with my new OCaml 3.11 without ocamlfindlib.cmxa -- make world.opt didn’t do it, neither did make opt.opt.

Also found SWIG for OCaml, looking really mature with C++, representing class objects as closures. Am itching to wrap SRILM systematically in SWIG.

OCaml binding to SRILM ngram works!

I’ve emailed it back to Andreas. My own version of pplFile redirects cerr to ostringstream instead of cout, then captures it, later it’s parsed by three_fourths in The Perp System -- since ngram -debug 1 outputs teh perplexities of the sentences on the 3rd, and then each 4th line after that.

I’m creating LM clients inside C++ and return integer handles to OCaml; one’s supposed to call lm_destroy for every lm_create, and in reverse order, and no creations after deletions. This corresponds to my use case, but surely can be generalized for better bindings, and switched from static LMClient *clients[] to std::vector<LMClient*>.