Dividing a file into chunks along line endings in Erlang
I’ve been dabbling in Erlang recently. I’ve wanted to learn a functional programming language for a while now and Erlang’s concurrency make it rather attractive.
For my “hello world” app, I decided to write a simple log parser which processes chunks of a file in parallel. Here is a part of that app which produces a list of tuples which describes the chunks adjusted to the nearest newline (Unix newlines, \n, in this case).
getChunkDivisions(File, 0, Chunksize, ChunkDivisions) ->
[{chunk,0,Chunksize}|ChunkDivisions];
getChunkDivisions(File, Size, Chunksize, ChunkDivisions) ->
if
Size-Chunksize=<0 ->
ComputedChunkEnd = Chunksize-(Chunksize-Size),
CorrectedChunkEnd = walkToNextLineBreak(File, ComputedChunkEnd),
getChunkDivisions(File, 0, CorrectedChunkEnd, ChunkDivisions);
true ->
ComputedChunkEnd = Size-Chunksize,
CorrectedChunkEnd = walkToNextLineBreak(File, ComputedChunkEnd),
getChunkDivisions(File, CorrectedChunkEnd, Chunksize, [{chunk,CorrectedChunkEnd,Size}|ChunkDivisions])
end.
walkToNextLineBreak(File,Start) ->
file:position(File, Start-1),
{ok, Data} = file:read(File, 1024),
Start+string:chr(Data, $\n).