Wednesday, December 10, 2008

OCaml objects wrap C++

Amended SRILM generateSentence with prepopulated context, fixed end-of-sentence termination, and wrapped lmclient into a new lmclass. Still need to implement Mauricio advice on Gc.properly -- compiler rejects things like

Gc.finalise self#destroy, or Gc.finalise ignore (Lmclient.destroy handle) -- during runtime!

The wrapped system now works and the new method Lmclient.complete_sentence maxwords context properly generates exact-size completions.

I also got the book “Q for Mortals” and continued to evaluate J; also got the educational APL from Dyalog, but it’s only for Windows.

Ah, and my C bindings handle is now an abstract

type handle

-- also thanks to Mauricio. I added things like null () value and is_null, returning a null value from C and checking it there, similarly to limited private types in Ada.

Tuesday, December 9, 2008

OCaml bindings

Continued making object-wrapping bindings for OCaml representing SRILM C++ classes. The caml-list advice from Mauricio and Filipp Monnier is cool.

In search of OCaml bindings, found PLplot, tried to install it in macports -- doesn’t work with gtk-osx +no_x11, discussed that on macports list and plplot-general. Basiclaly, in svn checkout of plplot, you have to say,

cmake cmake

-- the latter is a directory -- and it will generate enough to run

ccmake .

-- which will provide graphical displays.

J, kx/Q, APL

Stumbled upon kx/Q in a Stevey’s rant on “a portrait of a n00b,” containing the silliest quote about “OCaml, Haskell, and their ilk.” That brought me back to kx/Q perusal and a tutorial by Boroff, and also J primer -- which is a treat to launch and follow. Ken Iverson clearly had written it. I wish I can program new mobile platforms at 80 as he did.

Quote:
A word is a group of characters from the alphabet that has a meaning.

LM novelties

Found RandLM, based on Bloom filters -- championed by Broder too. I need LM generation for sentence completion -- this can be done rather easily with SRILM, but could be cool with other methods, e.g. associative LMs. Gathered another dozen papers on LMs, including WSME by Roni Rosenfeld, and others.

New NLP findings with OCaml

HunPoS is a part of speech tagger in OCaml, implementing HMM with suffix recognition. This is a superb foundation for sequence HMM! Has a fast C lib. Didn’t build with my new OCaml 3.11 without ocamlfindlib.cmxa -- make world.opt didn’t do it, neither did make opt.opt.

Also found SWIG for OCaml, looking really mature with C++, representing class objects as closures. Am itching to wrap SRILM systematically in SWIG.

OCaml binding to SRILM ngram works!

I’ve emailed it back to Andreas. My own version of pplFile redirects cerr to ostringstream instead of cout, then captures it, later it’s parsed by three_fourths in The Perp System -- since ngram -debug 1 outputs teh perplexities of the sentences on the 3rd, and then each 4th line after that.

I’m creating LM clients inside C++ and return integer handles to OCaml; one’s supposed to call lm_destroy for every lm_create, and in reverse order, and no creations after deletions. This corresponds to my use case, but surely can be generalized for better bindings, and switched from static LMClient *clients[] to std::vector<LMClient*>.

Sunday, December 7, 2008

Evry Nth element of a list

For the general case, I’ve done this:

let each_nth n list = List.fold_left2
(fun acc a i -> if i mod n = 0 then a::acc else acc)
[] list (range (List.length list))

For each 4th, OlegFink suggested:

let rec fourth = function _::_::_::a::xs -> a::(fourth xs) | _ -> []

list list transpose

I’ve written the following lili (’a list list) transpose for my perp system:

let transpose lili = List.map (fun n -> List.map (fun li -> List.nth li n) lili) (range0 ((List.length (List.hd lili))-1))

The range0 function generates an integer list [0;1;...;n-1] and is quite trivial:

let range ?(from=1) upto =
let rec go from upto acc =
if from > upto then acc else go from (upto-1) (upto::acc)
in
go from upto []

let range0 = range ~from:0

-- it can be further simplified with F#-like |>.

But OlegFink’s solution is simply

let rec transpose = function []::_ -> [] | list -> List.map List.hd list :: transpose (List.map List.tl list)

Conditional compilation in OCaml

Thanks to Mauricio Fernandez on the IRC, got my with/without pgocaml setup working:

dataframe.cmo: %.cmo: %.ml
        ocamlfind ocamlc -pp "camlp4o Camlp4MacroParser.cmo -DONT_USE_POSTGRES" -c $< -o $@

-- lest uncommenting things, I simply define an unexisting symbol instead of the necessary. Then, in dataframe.ml:

let get ?fromfile () =
match fromfile with
| Some file -> load file
| None -> IFDEF USE_POSTGRES THEN percells_dataframe () ELSE failwith "no fatabase for you" END

Saturday, December 6, 2008

Fixing OCaml ocamlopt for Mac

Posted to the caml-list my findings about caml_atom_table bug from 2005 on Mac OSX, still not fixed, fixable by = NULL in asmrun/startup.c:

char * caml_code_area_start, * caml_code_area_end = NULL;

Found that compiling SRILM without Tcl, by inserting NO_TCL=yes in Makefile.machine.macosx, solves the problem of tcl main/app_init. Has to pepper the Ldir -llib with -cclib when passing to ocamlopt, removed old ocaml 3.10 and compiled 3.11 form source, with findlib and pcre.

Friday, December 5, 2008

Wrapping SRILM ngram in OCaml

Learned the hard way that we return a concrete type with

CAMLreturn (Val_int(num_clients))

-- not with (caml_copy_nativeint, or _int32, or _int64) -- thanks to flux of Findland on #ocaml!

Sunday, November 30, 2008

TextMate's OCaml mode

Somehow it knows which function we’re in, in the bottom leftmost menu, but fold handles are all useless.

Now when editing Eval, we see that when no selection is active, the choices of what to send are Document, Word/Char, Line, Scope, or Nothing. It would be reasonable to assume that the Scope is a current function, which is shown in the bottom of the screen, but it isn’t.

Even if we can’t select the current defun when nothing’s selected, perhaps we could select the surrounding block -- defined as contiguous lines with no blank lines in between, surrounded by blank lines. In fact, “Edit=>Select => Paragraph” seems to be even smarter -- it tenaciously finds and selects a preceding comment. So if somehow we could select that and then Eval it, now *that* would be cool.

A good read about Gerard Huét:

http://translate.google.com/translate?u=http%3A%2F%2Finterstices.info%2Fjcms%2Fc_5629%2Fgerard-huet-d-une-frontiere-a-l-autre&hl=en&ie=UTF-8&sl=fr&tl=en

His Zen toolkit is awesome!

http://sanskrit.inria.fr/ZEN/

Friday, November 28, 2008

PGOcaml, etc.

How to change NOT NULL constraint in Postgres (pg):

alter table <name> alter column <column> set not null -- or, drop not null

query column names -- use pg_class, or retain result column names

 SELECT
a.attname as "Column",
pg_catalog.format_type(a.atttypid, a.atttypmod) as "Datatype"
FROM
pg_catalog.pg_attribute a
WHERE
a.attnum > 0
AND NOT a.attisdropped
AND a.attrelid = (
SELECT c.oid
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname ~ '^(my_tablename)$'
AND pg_catalog.pg_table_is_visible(c.oid)
)
;

-- or

select a.attname from pg_attribute a, pg_class b
where b.relfilenode=a.attrelid and b.relname='my_tablename'
and a.attname not in ('tableoid','cmax','xmax','cmin','xmin','ctid')

-- or

select attname from pg_attribute
where attrelid = 'percells'::regclass -- 'my_schema.my_tablename'::regclass
and attnum > 0 and not attisdropped;


When doing
select * from table
, the table may consist of dynamically generated columns. In this case, it would make sense to return each list as a row, not a tuple. Trying to convert the tuple of generally statically unknown length to a list involves Obj.magic, thanks to olegfink on #ocaml, which caused vixey’s wrath thence too.

Hence the following pgocaml extensions are needed:

-- properly close connection during camlp4, not abruptly (leaving a log message in pg complaining about client closing connection unexpectedly)
-- list type for rows if desired
-- column names if desired -- looks like we need it only once and can use with describe_statement already? NB check

Tuesday, November 25, 2008

OCaml Findings

Programming ngram sensor data models in OCaml exercised various language features.

-- String continuation -- breaking long strings across the lines

“he\
llo”


-- glues it together without the leading spaces.

Using, as a parameter, a function with optional parameters, and using them, requires that the function type is fully known. Example is from cells/genlm.ml:

let dirwalk (f:?mincount:int -> ?date:string -> string -> unit) ?date root =
let numbers = Str.regexp "^[0-9]+$" in
let subdirs = Array.to_list (Sys.readdir root) in
let subdirs = List.filter (fun x -> Str.string_match numbers x 0 && x <> "0") subdirs in
(* let subdirs = ["9"] in *)

match date with
| Some date ->
List.iter (fun x -> f (Filename.concat root x) ~date:date) subdirs
| None ->
List.iter (fun x -> f (Filename.concat root x)) subdirs

Monday, November 24, 2008

A nasty Makefile bug

I copied a sample Makefile for PG’OCaml from Dario Teixeira’s tutorial, and it didn’t work, giving me circular dependencies on lines like

$(PROJECT): $(PROJECT).cmo

It turns out the first line,

$(PROJECT) := cellspans #contained a trailing space.

It would be useful to add a hook to TextMate Makefile mode to trim the trailing spaces upon save, and same with Emacs (the latter can surely do it).

Saturday, November 15, 2008

F# leads to Mono leads to Gtk

An exploration of F# started in February when I stayed with Misha Entin. The Microsoft Research aura warmed me up to .NET, through the amazing achievement of Don Syme and others. I got Expert F# when it just came out (Foundations were kind of simple next to it so I’ve returned them back to the MS store, via Misha). I’ve even got a harmonica in the key of F#!

Now I got Mono and tried it -- works nicely! Especially with Johan Kullbom’s bindings for TextMate which allow to send selection to an iTerm tab with fsi running. Fixed fsi delete for that, creating a BSDel profile for iTerm where backspace now sends 0x08 (had to manually use ^H before). BTW, I copied that to OCaml bundle as a same kind of Eval, works charmingly.

Well, tried Piotr Zurek’s F# Gtk example -- initially he forgot to add Application.run () line in the end -- which lead me to getting Gtk for Mono. And this is just in time when Imendio’s gtk-osx.com matured enough to have X11-less Gtk!

I tried MonoDevelop and Art Wild’s F# bindings for it. These didn’t work right away off Google Code -- right-clocking on References didn’t do anything. So I edited the .mdp file as XML in TextMate, and saw where things are in the GAC.

Then got the svn versions of mono, addins, and monodevelop. Found I need to check out mcs next to mono, and also got libgdiplus, debugger, gtk-sharp, and webkit-sharp, all there. Compiled it all into --prefix=/mono, MD didn’t work, but I learned a lot about how fragile the dependencies are, just like gnome itself. I tried to compile it on Linux in the times past, about every 2 years, and seeing the word pango makes me shiver still.

So I found a good document on compiling mono 2.0 with macports -- that’s a good way to do it, with X-less Gtk. Since my previous macports build of update-desktop-database for MD pulled out the whole Gnome and then some, I had to uninstall it -- by trial and error, rearranging uninstall lists -- and then reinstall with +no_x11 (<=> +quartz AFAICS). It’s very important to use Mono Parallel Environment here, configuring PKG_CONFIG_PATH properly, as well as MONO_PATH and MONO_GAC_PREFIX.

The next stage of trying Gtk was to get or compile it on Windows. I found instructions how to do it with Cygwin and .NET 1.1, which worked -- then folks on gtk-sharp list couldn’t believe I did it with csc 1.1... The only problem is, in the end we have a bunch of DLLs, and adding them as references doesn’t work by itself -- the glue DLLs are also needed. The problem is solved by adding the glue ones to the path, I created a /c/opt/gtk/glue/ and put them there and it on the path.

Also, F# does not look for things in the GAC. Thus although Gtk installer from Medsphere (2.12.2 vs my compiled 2.12.5 at the time) adds the DLLs to the GAC. #r “gtk-sharp” from fsi didn’t cut it. Don Syme said I can specify a regisrty key, AssemblyFoldersEx, and I simply added c:\WINDOWS\assemblies to it. Am not sure that’s such a good idea, and it doesn’t load things from the GAC still -- or I need to reboot my VMWare first...

Friday, November 7, 2008

Nonnegative Matrix Factorization -- add links

I’ve been playing with the NMF as a tool to enable meaningful clustering and found several useful implementations.

procoders.net: python implementing Sra et al
Suvrit Sra has C++ code for sparse matrices of their own
John Burkardt keeps a collection of Fortran codes, including a DLAP (Double version of SLAP).

Rasmus Munk Larsen has a very nice SVD package PROPACK which I used for Netflix with filling missing values by the means/medians, and it uses OpenMP. I tried it with Intel Fortran, ifort, and MKL parallel BLAS library. I also tried GNU Fortran and GotoBLAS.

SLEPc is a generalized modern SVD package, implementing a Lanczoz algorithm similar to Larsen’s. It looks like the SVD to use, in conjunction with PETSc and TAO -- the latter ones are used in CRF with VEB by Lin Liao.

Numerous versions of NMF in MATLAB exist, most interesting from the convex optimization toolbox for MATLAB from Stanford.

Ngoc-Diep Ho’s thesis from Louvain from 2008 is a most thorough coverage of the NMF up to date.