Saturday, July 12, 2014

Elixir Tasks vs Scala Futures

Elixir has introduced a wonderful first class concept called a Task. It allows you to do some work in a new process and easily collect the result, like so:
iex(1)> something = Task.async fn ->
...(1)>   10 + 6
...(1)> end
%Task{pid: #PID<0.44.0>, ref: #Reference<0.0.0.55>}
iex(2)> Task.await(something)
16
On first glance, it looks a lot like constructs found in other languages. For example, Scala has a Future construct that seems similar:

scala> val something = Future { 10 + 6 }
something: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise@5535cbe
scala> Await.result(something, 0 nanos)
res1: Int = 16
Seems pretty similar on first glance right? While it looks like they offer the same abstraction - the ability to asynchronously execute code and return the result - there are some key differences.

Poll based

Let's take a look at Scala Futures. Scala Futures are built on top of JVM threads, along with all the usual suspects - Mutexes, Synchronized Queues, etc. One way in which we can interact with futures is to see if they have completed:

scala> val something_long = Future { Thread.sleep(10000) }
something_long: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise@1aa6a14b
scala> something_long.isCompleted
res7: Boolean = false
...
scala> something_long.isCompleted
res10: Boolean = true

We can poll to see if the Future has completed. This, in fact, is at the core of how the Await.result method works (as of Scala 2.10). It polls the Future for completion until either the time runs out or the Future has completed. Once complete, the internal value of the Future is returned.

This polling mechanism is built upon the Future being in shared memory. Any thread can access it and therefore any thread can wait for it to complete. Multiple threads can fetch the result and the result can be fetched multiple times.

scala> val something = Future { 10 + 6 }
something: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise@47041bea
scala> Await.result(something, 0 nanos)
res12: Int = 16
scala> Await.result(something, 0 nanos)
res13: Int = 16

Push Based

Now let's look at Elixir Tasks. Tasks are built on top of message passing. Without the Task construct, the same thing can be (crudely) implemented using standard Elixir code:

iex(4)> parent = self
#PID<0.40.0>
iex(5)> something = spawn_link fn ->
...(5)>   send parent, 10 + 6
...(5)> end
#PID<0.54.0>
iex(6)> receive do
...(6)>   x -> x
...(6)> end
16

Tasks give us a bit more convenience than the above code, but at their core they are fairly simple. A new process is spawned. When it completes, it sends the response back to the parent process.

Because the result is pushed to the parent process, we find ourselves with a limitation. Task.await must be called from the parent process. If called from a different process, it will never receive the message and thus time out:

iex(9)> something_nested = Task.async fn ->
...(9)>   Task.async fn ->
...(9)>     "nested"
...(9)>   end
...(9)> end
%Task{pid: #PID<0.73.0>, ref: #Reference<0.0.0.138>}
iex(10)> something = Task.await(something_nested, 1000)
%Task{pid: #PID<0.74.0>, ref: #Reference<0.0.0.140>}
iex(11)> Task.await(something, 1000)
** (exit) exited in: Task.await(%Task{pid: #PID<0.74.0>, ref: #Reference<0.0.0.140>}, 1000)
    ** (EXIT) time out
    (elixir) lib/task.ex:173: Task.await/2

A task spawned in another task fails to return a value, exactly as we expected. If we call await inside the first task, all is well:

iex(13)> something_nested = Task.async fn ->
...(13)>   something = Task.async fn ->
...(13)>     "nested"
...(13)>   end
...(13)>   Task.await(something, 1000)
...(13)> end
%Task{pid: #PID<0.99.0>, ref: #Reference<0.0.0.214>}
iex(14)> Task.await(something_nested, 1000)
"nested"
We also can't fetch the value twice:

iex(21)> something = Task.async fn -> 10 end
%Task{pid: #PID<0.118.0>, ref: #Reference<0.0.0.259>}
iex(22)> Task.await(something, 1000)
10
iex(23)> Task.await(something, 1000)
** (exit) exited in: Task.await(%Task{pid: #PID<0.118.0>, ref: #Reference<0.0.0.259>}, 1000)
    ** (EXIT) time out
    (elixir) lib/task.ex:173: Task.await/2

Failure

With Scala Futures, errors are largely silent, at least until you try to use the value:

scala> val somethingBroken = Future { throw new Exception("oops") }
somethingBroken: scala.concurrent.Future[Nothing] = scala.concurrent.impl.Promise$DefaultPromise@65e93514

scala> somethingBroken.isCompleted
res18: Boolean = true

scala> Await.result(somethingBroken, 0 nanos)
java.lang.Exception: oops
 at $anonfun$1.apply(:15)
 at $anonfun$1.apply(:15)
 at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
 at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

With Elixir Tasks, an exception in a task will cause the parent process to crash. This is part of the Erlang/Elixir "Let it crash" philosophy.

iex(1)> self
#PID<0.145.0>
iex(2)> response = Task.async fn -> raise "oops" end

=ERROR REPORT==== 12-Jul-2014::14:44:36 ===
** Task <0.149.0> terminating
** Started from <0.145.0>
** When function  == #Fun<erl_eval.20.106461118>
**      arguments == []
** Reason for termination ==
** {#{'__exception__' => true,'__struct__' => 'Elixir.RuntimeError',message => <<"oops">>},
    [{'Elixir.Task.Supervised',do_apply,2,
                               [{file,"lib/task/supervised.ex"},{line,70}]},
     {'Elixir.Task.Supervised',async,3,
                               [{file,"lib/task/supervised.ex"},{line,15}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
** (EXIT from #PID<0.145.0>) an exception was raised:
    ** (RuntimeError) oops
        (elixir) lib/task/supervised.ex:70: Task.Supervised.do_apply/2
        (elixir) lib/task/supervised.ex:15: Task.Supervised.async/3
        (stdlib) proc_lib.erl:239: :proc_lib.init_p_do_apply/3

Interactive Elixir (0.14.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> self
#PID<0.150.0>

Tasks fit in nicely with the rest of the OTP ecosystem, complete with supervisors and monitors.

Why this matters

If you are coming to Elixir from a language like Scala, you might be tempted to treat Tasks like you would Futures. This is generally a bad idea. While they represent similar intents, the implementation means that they provide different abstractions. A Future provides an abstraction for access of future shared state. A Task provides an abstraction for access of a future response that is limited to a single process. This follows the general pattern in Erlang/Elixir of not sharing state.

The effect of this difference needs to be understood when writing code.

In Scala we frequently provide libraries that return Futures. It allows us to compose functionality from a variety of sources and build pipelines of execution. The underlying execution is abstracted away. (A discussion of various ExecutionContexts is beyond this post.)

scala> val response = for {
     |   a <- responseA
     |   b <- responseB
     | } yield {
     |   a + b
     | }
response: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise@3b05298d

scala> Await.result(response, 0 nanos)

In Elixir we want to always be explicit about where code is being executed. Rather than have libraries launch tasks on our behalf, the better technique is for libraries to return functions. We can then launch these functions wherever we want:

iex(15)> fun1 = fn -> 25 end
#Function<20.106461118/0 in :erl_eval.expr/5>
iex(16)> fun2 = fn -> 35 end
#Function<20.106461118/0 in :erl_eval.expr/5>
iex(17)> response = Task.async fn ->
...(17)>   responseA = Task.async fun1
...(17)>   responseB = Task.async fun2
...(17)>   Task.await(responseA) + Task.await(responseB)
...(17)> end
%Task{pid: #PID<0.109.0>, ref: #Reference<0.0.0.240>}
iex(18)> Task.await(response)
60

Conclusion

Elixir has been a blast to work with. The language gives you very powerful constructs that enable you to build powerful and robust systems. Understanding how to use these structures is crucial. In the case of Tasks it is important to always be explicit about where code is executing. If you have any questions feel free to ask here or on IRC in #elixir-lang.

Sunday, September 29, 2013

How I develop - A stream of consciousness

I recently took the time to dump my "stream of consciousness" for someone trying to wrap their head around how to go about designing an application. With his permission, here is the initial email (from the Utah Ruby Users Group mailing list):

I don't spend a lot of time writing code, but I've been trying to steadily improve my ability to build functional prototypes in ROR and I'm looking for suggestions on how to improve my model design skills.

The entire MVC approach is new to me - I first learned to write code before many of you were born, but I haven't had to use it in any work-related responsibilities for 20 years.  Any suggested resources would be appreciated.

I feel like I learn best by going through examples with explanations.  The Michael Hartl tutorial was a helpful beginning, but I'm having difficulty finding other examples where I can see the functioning app (or a description of the functionality), and a list of all the models - especially with some reasoning behind their decisions.

As an example, my current side project is to make an app I can use with my children to assign and track their chores, and how much allowance I should pay them based on what they've done.  Here's a few details on how the app would work...

Chores are assigned to people, and they have a recurring frequency (daily, weekly, monthly,...).  Sometimes the person that did the chore is not the person assigned to do it.  I want to track all the dates a chore was done, who was assigned to do it, and who actually did it.

In my mind I can see tracking that information as attributes of a chore, or I could have a separate model for chores performed.

How do I make that decision?  While I'd like advice on this specific example, I'm really trying to understand more generally.  I'm finding that this is the most time consuming and difficult part of each new project I undertake.

Any suggestions on how to learn model design "best practices" would be appreciated.

To which I responded:

Remember that this is less about Ruby and Rails and more about software design as a whole. I recommend Agile Software Development, Principles, Patterns, and Practices for a better understanding of the design process. (That's a shameless affiliate link by the way) There's a great example of building a bowling score tracker that demonstrates how to incrementally design. I'm going to walk you through a similar process, but at a higher level than code. This description is unedited (don't worry, no foul language) and is meant to be a stream of consciousness so you can get a feel for how I design things. Everyone is going to have a different process though, and I expect many here to disagree with my decisions. Here goes nothing:

I always try to approach problems from a top down perspective. I mostly work on REST apis, so I'm going to walk through your example from that approach.

So in your example, we definitely have people or users. This would be a normal resource endpoint

/users

and we have a list of all chores. This would be another resource endpoint. 

/chores

we probably have an endpoint that returns all chores assigned to a user

/users/1/assigned_chores

and we may even scope those down by frequency (this is a a little off the beaten REST path, but don't let that deter you. Naming things is hard, and as long as there is consistency and a degree of intuition shown, it's fine)

/users/1/assigned_chores/daily
/users/1/assigned_chores/weekly
/users/1/assigned_chores/monthly

and a user has a list of chores completed by them

/users/1/completed_chores

Now that I have some endpoints, I start thinking about how to implement them.

For /users , this is going to be pretty straightforward.

For /chores, I see some ambiguity. Is a chore the action being performed? or a reusable description of a chore? in other words, is it "Bobby clean up his room" or "Clean bedroom. Pick up toys. Put away clothes." The former is simpler, but it means you are going to repeat yourself when you create the "Sally clean up her room" task. The latter is going to be more complex, but you will be able to reuse the Chore description multiple times. The latter is probably the more correct way, but it feels potentially over-engineered for me. When it comes to avoiding repeating myself, I tend to follow a Rule of 3s. If I repeat myself at least 3 times, I do something about it. Until then, I try to go the simple route for the sake of getting things done. (Usually also weigh in on the cost of changing my mind later. In this case, a chore will have a description field. If we decide to go with the reusable descriptions, we could drop the description field and replace it with a task_id, where a Task contains the reusable description)

Anyway, let's assume we don't mind repeating ourself and stick with a chore representing "Clean Bobby's Room" and we'll just create another chore for "Clean Sally's Room". With that assumption, Let's look at types of actions we want to perform:

CREATE: We create a chore (description and frequency) and assign a user to it

READ: We return a chore (description and frequency) along with the assigned_user, the date of completion, the user who completed it.
At this point I start to wonder: How does date of completion and user who completed it work with a recurring chore? If we only care about the last person to complete the chore, then this would be fine. I suspect though, that paying allowance once a week means we want to see each time a daily chore was completed and who completed it. So let's try again:
READ: We return a chore (description and frequency) along with the assigned_user and a list of dates of completion tied to who completed it at that point in time.
This introduces a new model requirement. we need to be able to do something like @chore.completion_events (as I said, naming is hard... there's probably a better name than completion_events) so now we have a new endpoint:
/chores/1/completion_events
with actions CRUD, where we have a user and a completion date assigned to it. For index, we will probably want to be able to query on a date range, giving us the answer to the question "Who took out the trash in the last 7 days?" and other similar questions.

UPDATE: Since the act of completing a chore is now accomplished by creating a CompletionEvent, we only use update on chores to update the description and the assignee.

DELETE: Pretty straightforward.

As we continue to our other endpoints, it looks like we've answered most questions.

/users/1/assigned_chores will be a list of chores with user_id = 1. I would make this just an Index action, and leave CRUD up to the /chores endpoint.

/users/1/assigned_chores/{daily,weekly,monthly} are going to be extra routes pointing to /users/1/assigned_chores and setting a param for "frequency" or something that can be used to filter the query. This will then also be an Index only action. Now that I think about this, these routes seem superfluous. Probably aren't needed.

/users/1/completed_chores is going to be a little complex. We will want to do a lookup on CompletionEvents by user_id, and then for each of those we will want to fetch the associated Chore. We will also want a date range for this most likely so we can answer the question "What chores did Bobby do this week?"

So now, without even really touching Ruby or the underlying implementation, we see that we need 3 models:

User
Chore
CompletionEvent

with an optional 4th model (depending on a design decision):

Task (a Chore has a task_id so multiple chores can share a description)

The attributes:

User
  id
  name
Chore
  id
  task_id
  user_id
CompletionEvent
  id
  date
  user_id
  chore_id
Task
  id
  description

And the relationships (ActiveRecord associations):
User has many Chores
User has many CompletionEvents
Chore belongs to Task
Chore belongs to User
Chore has many CompletionEvents
CompletionEvent belongs to User
CompletionEvent belongs to Chore
Task has many Chores

And there you have it. That should be fairly complete (I'm sure you can attach other attributes, like how much to pay them in allowance and stuff like that). From there, implementation is going to mostly be trivial. You'll encounter some tricky implementation details like what @user.completed_chores actually looks like, but often it's just because there are so many good ways to implement it that sometimes we get hung up on finding the great way to implement it. I would focus on just getting things done at that point. Don't worry about whether you should be using a has_many :through with :conditions or whatever. Just get the data that you need. If you notice something is clumsy, look for a way to clean it up. But if it works then don't worry too much about the right way. It is far easier to learn a shortcut if you've done it the long way a couple of times.

I don't claim that this is the best solution, but I do feel it demonstrates a good balance between GSD (getting stuff done) and a well thought-out design. I'm sure there are better designs and I'm sure there are worse designs. In any case, I felt that it was worthy of a post and will be a helpful resource to someone.

Friday, May 31, 2013

Copy as Curl

At work the other day I noticed something new in the Chrome browser developer console. Under the "History" tab, if you right click on a source you are presented with the following menu:



The menu item that intrigued me was "Copy as Curl". For those who are not Unix savvy, curl is a common utility that allows you to fetch a web page without using a browser. It is a very useful tool in web development. When I clicked on Copy as Curl, I pasted this into my terminal:

The command returned gibberish, but I noticed that it had the "Accept-Encoding: gzip,deflate,sdch"" header set, which means the content is compressed. By piping the results into gunzip (and then jsonpp) I was able to get the following:
Overall I thought it was a pretty cool feature. If you are still wondering why that output would be useful, it is because that output is extremely easy to parse from a programming perspective. You could use this to easily build a script that monitored the price for a specific laptop configuration for example.

Well made software provides solutions to needs we don't know about. I don't think I ever thought "I wish there was a way to make an identical request with curl", but now that I know about it, I will use it a ton.

Monday, March 11, 2013

SQLite and MongoDB had a baby

And I call it.... MongoLiteDB

I'm probably infringing on some trademarks and all sorts of proprietary things, but that's the name I gave it over at GitHub. If that's a problem, then I expect the affected parties will contact me and I will be accommodating.

What is it? If the name isn't clear, it combines some of the strengths of the two solutions while conveniently ignoring requirements I don't care about, namely performance. It provides a very simple to set up, simple to use database.

SQLite is a great lightweight database solution. It uses a single file to store and retrieve all entries and therefore does not require an external service. It does however require a schema, which can be tedious to maintain and update in a rapidly changing project. MongoDB is a great schema free database. Without having to manage a schema, I can hack away to my heart's content. It does however require an external service. It isn't what I would call a lightweight solution.

In SQLite, a great deal of performance gains result from the structured nature of the data. In MongoDB, the external service works all kinds of magic in order to provide a performant schema free solution. Without either of these, performance probably sucks. (I haven't bothered to benchmark it.)

I decided I sometimes just didn't care about performance. In my simple applications, I don't expect to have significant traffic. I don't even think anyone but me will use the service. I want a solution that is as frictionless as possible during development and prototyping. Performance is my last concern. And so MongoLiteDB was born.

I probably spent more time building MongoLiteDB than it would have taken me to maintain my SQLite schema. I mostly built MongoLiteDB because it was something new and exciting. Hopefully myself and other can get some use out of it.

Anyway, read more about it over on GitHub. It's pretty raw still. The code needs comments. The documentation is nonexistent. It's not even packaged as a Gem yet. If it's interesting to you, please let me know. Leave comments on this blog. Fork the project. File Issues. I think this would be a fun project to expand upon and interest from the community will be a great motivator to do so.

Friday, April 27, 2012

Macro Tubes, Canon T3i and Focus Stacking

So I got a new toy this week - Macro Extension Tube Kit Compatible with Canon Cameras.  Essentially it's a set of spacers that tweak the focus range of standard lenses to allow them to shoot macro more effectively.  They can be a little clunky to use, as they are literally just a spacer between the lens and the camera.  This means you lose any connectivity between camera and lens - aperture control and autofocus are lost.  But they cost less than $10 shipped, so I figured they would be fun to play around with.  The alternative was to invest in a macro lens, which I can't justify.

Anyway, they arrived on a particularly boring day.  Construction in our neighborhood meant a planned power outage for half the day.  Seeing as I am studying Electrical and Computer Engineering, I didn't have much to do without Electricity.  So I got to play with them for a few hours, figuring out how to do everything a little more manually than usual.

So I took a few random shots:

Close up of my computer monitor.  Each pixel is composed of 3 little bars - red, green, and blue - which can be see quite clearly.  This was zoomed in a bit from the original image and cropped.
A close up of my tool case. 
The tag on my son's stuffed animal.
They were some pretty random images, mostly the result of thinking "What could I take a close up picture of next?",  but I was pleased with the clarity of using the macro tubes.  They do create a pretty narrow depth of field, so manual focus is tricky, but it's fun to play around with.

One interesting subject I have been looking at is focus stacking.  When you have a narrow depth of field, you can take multiple pictures of a subject, at different focuses, and later combine them into a complete image, with a greater depth of field than any of the originals.  So I gave it a shot by taking a picture of my wedding ring sitting on top of a white piece of paper.  The software I used was Enfuse, which is a little tricky to use, but once setup it does a great job of pulling the whole image together.

The middle of the paper comes into focus, as well as the bottom edge of the ring.
The paper is in focus in the top right of the image.  The bottom left is out of focus.  The ring is completely out of focus.
The bottom left of the paper is in sharp focus, and a good part of the far side of the ring is in focus.
The paper is starting to move out of focus, as is the far side of the ring.
The paper is out of focus, and the near side of the ring is coming into focus.
The near side of the ring is in sharp focus.



And the final result:

Notice how the paper, the far side of the ring, and the near side of the ring are all in focus.

I used pretty conservative and standard setting with Enfuse.  Some things are more in focus than others, which I could probably fix by tuning Enfuse a little more.  I was very impressed by how well Enfuse combined the images on it's own.

Overall, the macro tubes are fun to play around with.  Probably worth the $10 to have on hand in case you ever feel an artistic need to take macro shots.  Enfuse and focus stacking are definitely something I'm going to have to play around with more in the future.

Thursday, April 5, 2012

Using Raphael.js and Backbone.js together

I'm a big fan of Backbone.js. It's lightweight enough to throw into any webapp I'm building, large or small. At the very least it gives me some guidelines on how to organize my code. In larger apps the synchronization with the backend makes my life a lot easier.

An app I'm currently working on makes heavy use of Raphael.js, a great SVG library. In order to unify my handling of user events, I extended the Backbone.js library a little. It's a simple change, but I really like what it did to the organization of code.



As you can see, my goal was to be able to address user interaction with Raphael object in the way I can address interaction with the DOM. DOM events can be specified under events via a "<selector> <event>" syntax, so one can do ".button click" or "#myElement customEvent". Since Raphael abstracts out the DOM from the SVG objects, I wanted to preserve the Raphael events but be able to throw them in with DOM events. So the delegateRaphaelEvents method converts Raphael events to DOM events in the form "raphael:<object name>:<event>".  The assigned callback function gets passed two objects, a jquery event object and a raphael event object.

It works great and keeps my code well organized.  Enjoy!

Wednesday, March 14, 2012

March Madness

I just created a random bracket generator.

http://freerandombracket.com

Each time you load the page a new random bracket comes up.

Each matchup is decided by a very simple weighted algorithm:

Team Score = sqrt(Team Seed) * <random value between 0 and 100>

For each matchup, the score is calculated for both teams.  Whichever team has the lowest score moves on to the next round.

The page should be printable.

Enjoy!

EDIT:  The above calculation was a little too loose.  It meant that 1 out of 4 matchups between 1 seeds and 16 seeds would be an upset.  So I changed it to the following:

Team Score = (30 + Seed^2)*<random value>

This led to a little more realistic match ups, where 9 times out of 10, a 1 seed would beat a 16 seed, a 1 seed would beat a 2 seed 10% more often than an upset, and an 8 seed would win against a 9 seed 20% more often.

I'm still playing with it, but as of right now that is the algorithm being used.