milmazz

Oban: Testing your Workers and Configuration

2022-02-21T00:00:00-06:00

In this article, I will continue talking about Oban, but I’ll focus on how to test your workers and, also, your configuration.

This article is the second one of a series about Oban, an Elixir library for background job processing:

Oban: job processing library for Elixir
Oban: Testing your Workers and Configuration (you are here)
Oban in production

Testing the implementation of the Oban.Worker behaviour

Before implementing an Oban Worker, I start writing the unit test that will insert and execute the job; these tests are usually short because your workers should be as lean as possible, I tend to treat Workers as Resolvers or Controllers that orchestrate a series of actions that, most of the time, execute functions that are already being tested and are part of the business logic layer. It would also help validate that the job is running the side effects in these unit tests.

defmodule MyApp.Migrations.MyDataMigrationWorkerTest do
  use ExUnit.Case, async: true 
  use Oban.Testing, repo: MyApp.Repo

  alias MyApp.Migrations.MyDataMigrationWorker 

  describe "perform/1" do
    test "performs data migration" do
      # prepare your test
      assert {:ok, result} = perform_job(MyDataMigrationWorker, %{"my_arg" => 1})
      # assert the side-effects of the Oban Worker
    end
  end
end

In the previous test, we used the perform_job/2 helper from the Oban.Testing module, this helper constructs a job and execute it with the given worker module. Apart from reducing the ceremony when constructing jobs for unit test, one of the neat things about perform_job/2 is that it does some assertions for us.

From the docs, we have the following:

That the worker implements the Oban.Worker behaviour

That the options provided build a valid job

That the return is valid, e.g. :ok, {:ok, value}, {:error, value}, etc.

If all of the assertions pass, the function returns the result of perform/1 for you to make additional assertions.

Apart from perform_job/2,3 there are other testing helpers such as: assert_enqueued/2,3, all_enqueued/2, refute_enqueued/2,3. For more information please check the Oban.Testing documentation.

It is important to note that if you want to test a worker included in Oban Pro like the Batch, Chunk, or Workflow, you should import the module Oban.Pro.Testing and use the process_job/2,3 helper instead:

import Oban.Pro.Testing

alias MyApp.MyBatchWorker

describe "process/1" do
  test "archives the given thing" do
    # prepare your test
    assert :ok == process_job(MyBatchWorkerBatch, %{thing_id: thing.id})
    # assert the side-effects of the Oban Batch Worker
  end
end

One important thing about testing, don’t forget to test your Oban configuration. Let’s talk about it next.

Testing your Oban Configuration

Do not wait until you deploy your application into your staging environment to catch errors in your Oban configuration, try to catch those errors as soon as possible; you can start by doing:

defmodule MyApp.ObanConfigTest do
  use ExUnit.Case, async: true

  test "check oban configuration" do
    [config, prod] =
      Enum.map(["config/config.exs", "config/prod.exs"], fn path ->
        path
        |> Config.Reader.read!(imports: [], env: :prod, target: :host)
        |> get_in([:my_app, Oban])
      end)

    assert %Oban.Config{plugins: _plugins} =
             config
             |> Keyword.merge(prod)
             |> Oban.Config.new()
  end
end

NOTE: In the previous test, I merged the Oban config that comes from config/config.exs and config/prod.exs; you should adapt this test based on your scenario.

But, for the previous unit test to make sense, you need to know a bit about the internal implementation of Oban. Well, the gist is this, when you try to start the supervision tree for Oban, it first tries to create an Oban.Config struct via Oban.Config.new/1, and if you supply the wrong configuration, it will blow up.

For example, let’s assume you made a typo in the repo option:

$ mix test test/my_app/oban_config_test.exs

1) test check oban configuration (MyApp.ObanConfigTest)
   test/my_app/oban_config_test.exs:4
   ** (ArgumentError) expected :repo to be an Ecto.Repo, got: MyApp.Rep0
   code: |> Oban.Config.new()
   stacktrace:
     ...

Once you fix the typo, you get:

$ mix test test/my_app/oban_config_test.exs
.
Finished in 0.02 seconds (0.02s async, 0.00s sync)
1 test, 0 failures

It is important to note that the Oban.Config.new/1 function doesn’t validate the internal configuration for the plugins; it only validates that the given plugin is an atom, is loaded, the plugin in question exports an init/1 function, and also the given opts is a keyword list. We will explore a workaround for this limitation in the following sub-section.

Testing your plugins configuration

Previously I mentioned that you should test your Oban configuration. Still, there were limitations in the previous approach; mainly, Oban.config.new/1 doesn’t validate the internal configuration that you pass to each plugin listed under the plugins option.

test "check oban configuration" do
  # ...
  assert %Oban.Config{plugins: _plugins} =
           config
           |> Keyword.merge(prod)
           |> Oban.Config.new()
end

So, if you want to go further and validate the given options to the Oban.Plugins.Cron plugin, for example, you need to know a bit about the internal implementation and use Oban.Plugins.Cron.validate!/1 and assert that the returned value is :ok.

I like to validate the cron plugin configuration because it constantly evolves, and it’s easy to catch typos in the cron worker module names also validating the cron expressions by doing the following:

assert %Oban.Config{plugins: plugins} =
         config
         |> Keyword.merge(prod)
         |> Oban.Config.new()

{_, cron_opts} = Enum.find(plugins, &match?({Oban.Plugins.Cron, _}, &1))

assert :ok = Oban.Plugins.Cron.validate!(cron_opts)

A word of caution about the previous unit test segment, regardless of that Oban.Plugins.Cron.validate!/1 is a public function; it has a @doc false, which usually means that function is for internal use. There could be unannounced breaking changes in the future, but you gain the following checks:

that the value for the crontab option is a list.
if you use the timezone option, it checks that the given value it’s a known timezone

For example, let’s assume that you’re using an invalid timezone:

$ mix test test/my_app/oban_config_test.exs

1) test check oban configuration (MyApp.ObanConfigTest)
   test/my_app/oban_config_test.exs:4
   ** (ArgumentError) expected :timezone to be a known timezone
   code: assert :ok = Oban.Plugins.Cron.validate!(cron_opts)
   stacktrace:
     ...

In the specific case of the cron plugin, the validate!/1 function also parses the crontab expressions; if a term is invalid, the validation will fail. But on the other hand, it also validates that the given module is loaded and implements the perform/1 callback, so this will catch possible typos.

$ mix test test/my_app/oban_config_test.exs

1) test check oban configuration (MyApp.ObanConfigTest)
   test/my_app/oban_config_test.exs:4
   ** (ArgumentError) MyApp.DailWorker not found or can't be loaded
   code: assert :ok = Oban.Plugins.Cron.validate!(cron_opts)

In the previous test, you can see that we tried to load a MyApp.DialWorker, but in this case, the correct name for that module is MyApp.DailyWorker.

I think it is worth taking the risk of using a non-documented public function in this case; you gain more in your daily routine because it will allow you to catch errors in your Oban Configuration as soon as possible.

After talking with Parker about the previous challenges, he suggested opening the following issue in the Oban repository: Improve the testing experience for Oban and its plugins

Testing workers included in Oban Pro

In a previous section, I mentioned that when you’re testing a worker included in Oban Pro, like the Batch, Chunk, or Workflow, you should import the module Oban.Pro.Testing and use the process_job/2,3 helper instead:

Possibly, the most challenging worker to test is the BatchWorker, especially if you want to try the whole cycle, including their handle callbacks, but let’s start with some dummy Batch Worker. Then we can start talking about the details of its tests.

So, let’s imagine the following worker:

defmodule MyApp.MyBatchWorker do
  use Oban.Pro.Workers.Batch,
    queue: :default

  require Logger

  @impl Batch
  def process(_) do
    :ok
  end

  @impl Batch
  def handle_exhausted(%Job{} = _job) do
    Logger.info("exhausted callback")
  end
end

The body of the module MyApp.ByBatchWorker is very similar to the usual Oban.Worker. The exception is that you need to handle your work under process/1 instead of the perform/1; the latter is used internally. One of the excellent additions of the Oban.Pro.Workers.Batch behaviour allows you to define a few callbacks that are perfectly explained in the behaviour documentation. In this example, I’m using the handle_exhausted callback, called after all jobs in the batch have either a completed or discarded state. You also see that I’m just logging that the callback was called in this case to keep the example as simple as possible, but you can do whatever you want here.

Okay, now the exciting part, let’s see how to test this worker.

defmodule MyApp.MyBatchWorkerTest do
  use ExUnit.Case

  import Oban.Pro.Testing

  alias MyApp.MyBatchWorker

  describe "process/1" do
    test "processes the background job" do
      assert :ok = process_job(MyBatchWorker, %{})
    end
  end
end

Here we’re testing the “bulk” of the worker. Now, let’s check one of the handler callbacks as a unit:

  use Oban.Testing, repo: MyApp.Repo
  
  describe "handle_exhausted/1" do
    test "handle exhausted behaves correctly" do
      assert :ok = perform_job(MyBatchWorker, %{}, meta: %{"callback" => "exhausted"})
    end
  end

As far as I know, the test helper process_job/2,3 will not work in the previous test because it ends calling our MyApp.MyBatchWorker.process/1, and we actually want to test our MyApp.MyBatchWorker.perform/1, which coordinates the execution of the callbacks. Again, here you need to know how Oban works internally. That’s why I ended up using: perform_job/3, passing the meta option. But please, be aware that perform_job/3 will not complain if your handle callback last expression returns things like {:ok, value}, {:error, value}, etc. Please remember that the handle callbacks contract specifies that you must return :ok. To fix this issue, a possible solution that is being considered is to include a perform_callback/3 helper under the Oban.Pro.Testing module.

Now, let’s assume that we want to go further in our test and check the whole cycle. In that case, we do:

defmodule MyApp.MyBatchWorkerTest do
  use ExUnit.Case

  import ExUnit.CaptureLog
  
  alias MyApp.MyBatchWorker

  require Logger
  
  # ...

  setup do
    oban_name = start_supervised_oban!(queues: [default: 3])
    Logger.put_module_level(MyBatchWorker, :all)

    on_exit(fn ->
      Logger.delete_module_level(MyBatchWorker)
    end)

    %{oban_name: oban_name}
  end

  test "testing our batch worker", %{oban_name: oban_name} do
    batch =
      ["foo@example.com", "bar@example.com"]
      |> Enum.map(&%{email: &1})
      |> MyBatchWorker.new_batch()

    log =
      capture_log(fn ->
        Oban.insert_all(oban_name, batch)
        Oban.drain_queue(queue: :default)
      end)
      
    assert log =~ "exhausted callback"
  end

  defp start_supervised_oban!(opts) do
    default_opts = [
      name: make_ref(),
      repo: MyApp.Repo,
      plugins: [
        {Oban.Pro.Plugins.BatchManager, debounce_interval: 5},
        {Oban.Plugins.Repeater, interval: 25}
      ],
      poll_interval: :timer.minutes(10),
      shutdown_grace_period: 25
    ]

    opts = Keyword.merge(default_opts, opts)
    name = opts[:name]

    start_supervised!({Oban, opts})

    name
  end
end

Wow, I know this is a lot to digest at once, so let’s split the previous snipped.

Suppose you saw the talk Testing Oban Jobs From the Inside Out from Parker. In that case, you remember that he calls this kind of testing “inline testing”, and this should be used when you absolutely must normally run jobs during your test (e.g., LiveView, Browser Testing), or in my case, I wanted to test the whole path in my Batch Workers. So, let’s start analyzing the start_supervised_oban!/1 helper:

  defp start_supervised_oban!(opts) do
    default_opts = [
      name: make_ref(),
      repo: MyApp.Repo,
      plugins: [
        {Oban.Pro.Plugins.BatchManager, debounce_interval: 5},
        {Oban.Plugins.Repeater, interval: 25}
      ],
      poll_interval: :timer.minutes(10),
      shutdown_grace_period: 25
    ]

    opts = Keyword.merge(default_opts, opts)
    name = opts[:name]

    start_supervised!({Oban, opts})

    name
  end

Here, with the help of start_supervised!/2, which comes with ExUnit, we’re setting up an Oban supervisor with some default options; the essential thing in the previous snippet is the plugins. So first, Oban.Pro.Plugins.BatchManager is needed because that’s the plugin that will track the execution of the Oban jobs within a batch and enqueue the callback jobs. So, yes, the callbacks are also Oban Jobs.

Then, we have the Oban.Plugins.Repeater plugin, which will poll every 25 milliseconds to look for new jobs. This plugin is essential in our case because inside the unit tests, PostgreSQL notifications don’t work. We’re using the Ecto Sandbox, which means that every unit test runs inside a transaction. Also, note that I’m using make_ref to create a unique reference that we will use as the name for the Oban supervisor; you cannot have more than one supervisor with the same id.

  setup do
    oban_name = start_supervised_oban!(queues: [default: 3])
    Logger.put_module_level(MyBatchWorker, :all)

    on_exit(fn ->
      Logger.delete_module_level(MyBatchWorker)
    end)

    %{oban_name: oban_name}
  end

Then, in our setup function, we use the previous helper start_supervised_oban!/1 to start a new Oban supervisor that will only handle jobs for the default queue. Then we say that we want to include all the logging levels available in the MyApp.MyBatchWorker module, the on_exit/1 callback will clean up this setting at the end of our test. Finally, we put the Oban supervisor’s name in the test context.

  test "testing our batch worker", %{oban_name: oban_name} do
    batch =
      ["foo@example.com", "bar@example.com"]
      |> Enum.map(&%{email: &1})
      |> MyBatchWorker.new_batch()

    assert capture_log(fn ->
             Oban.insert_all(oban_name, batch)
             Oban.drain_queue(oban_name, queue: :default)
           end) =~ "exhausted callback"
  end

And here is where we test our batch worker. First, we create a new batch with MyApp.MyBatchWorker.new_batch/1 and immediately insert with Oban.insert_all/2. It is essential to pass the oban_name as the first argument to that function. We drain the default queue right after using Oban.drain_queue/2. The way I’m testing here that the handle_exhausted/1 is called is by capturing the log or side-effect produced for that callback, but as I said before, you can do here whatever you want; you need to check the side-effects produced by your callback.

But wait, something is missing in our previous start_supervised_oban!/1 helper. Do you remember that I mentioned before that the plugins implement the GenServer behaviour? Also, in the test environment, we’re using Ecto.Adapters.SQL.Sandbox to allow concurrent transactional tests. So we need to enable collaboration from Oban processes to run these tests successfully for these conditions. All these processes should use the same connection, so they all belong to the same transaction.

From the Ecto.Adapters.SQL.Sandbox documentation, we have the following:

The test above will fail with an error similar to:

** (DBConnection.OwnershipError) cannot find ownership process for #PID<0.35.0>

That’s because the setup block is checking out the connection only for the test process. Once the worker attempts to perform a query, there is no connection assigned to it and it will fail.

The sandbox module provides two ways of doing so, via allowances or by running in shared mode.

And also, we have:

The idea behind allowances is that you can explicitly tell a process which checked out connection it should use, allowing multiple processes to collaborate over the same connection.

So, we need to include this allowance in our start_supervised_oban!/1 helper. Thankfully, Parker recently shared in the #oban channel the following gist to demonstrate how to integration test batch callbacks, there you can see this clever piece to allow the Oban Producers, Plugins, and other modules to use the checked out connection:

  # For Oban < v2.11 you can remove the `{:_, Oban.Peer}` part
  for key <- [{:_, Oban.Peer}, {:_, {:producer, :_}}, {:_, {:plugin, :_}}],
      pid <- Registry.select(Oban.Registry, [{{key, :"$2", :_}, [], [:"$2"]}]) do
    Ecto.Adapters.SQL.Sandbox.allow(MyApp.Repo, self(), pid)
  end

But again, this piece of code assumes you know some internals, so I will try to explain a bit the previous code.

In this excellent PR, Saša Jurić introduced Registry into Oban to replace all the locally registered names, and also to hold the configuration for those processes as part of the Registry metadata.

So, if you get everything that’s in the Oban Registry you see something like:

iex(1)> Registry.select(Oban.Registry, [{{:"$1", :"$2", :"$3"}, [], [{{:"$1", :"$2", :"$3"}}]}])
[
  {{Oban, {:watchman, "default"}}, #PID<0.576.0>, nil},
  {{Oban, Oban.Notifier}, #PID<0.568.0>, nil},
  {{Oban, {:plugin, Oban.Plugins.Stager}}, #PID<0.571.0>, nil},
  {{Oban, {:foreman, "default"}}, #PID<0.574.0>, nil},
  {{Oban, {:plugin, Oban.Plugins.Pruner}}, #PID<0.570.0>, nil},
  {{Oban, {:supervisor, "default"}}, #PID<0.573.0>, nil},
  {{Oban, {:producer, "default"}}, #PID<0.575.0>, nil},
  {Oban, #PID<0.567.0>, %Oban.Config{...}},
  {{Oban, Oban.Midwife}, #PID<0.569.0>, nil}
]

Then, based on the previous Registry output, the for comprehension filters the results based on the key pattern, and we return the pid (a.k.a. :"$2") for each of those coincidences. Once we get the pid, we explicitly assign the test process’s connection to each of these Oban processes.

And with the previous pattern, you can execute e2e tests, integration testing, and so forth.

Finally, if you want to know more about how to test Oban jobs, I highly recommend watching the talk Testing Oban Jobs From the Inside Out from Parker Selbert.

Conclusion

Oban offers some good helpers to run your Unit and Integration tests; you should pay special attention to the Oban.Testing documentation, and also keep in mind that you also have helpers to test workers included in Oban Pro such as Batch, Chunk, or Workflow.

But, indeed, there is always room for improvement. Especially when you want to test your Oban configuration as soon as possible, it would be awesome to have helpers validate all the settings, including plugins, at once. Also, in my opinion, the plugins should have a uniform way to test their configuration. As I mentioned in this issue, a possible solution could be:

A public test helper to validate the Oban configuration, including plugins. I think it’s also worth normalizing the plugins under a contract or behavior; Oban now enforces defining an init/1 function for plugins, but I think the behaviour should also include a validate!/1 function. At least the validate!/1 should help with the testing experience.

There are also plugins in the Pro package that could include complex configurations, like the Oban.Pro.Plugins.DynamicPruner. So, I think it’s worth it to offer a unified way to validate these configs.

I hope you find this article helpful, and you have a better idea of how to test your Oban Workers and your Oban Configuration.

If you want to reach me, you can do it at @milmazz on Twitter, or you can find me in the #oban channel on the Elixir Slack.

That’s all, folks! Thanks for reading.

Acknowledgments

Thank you, Parker Selbert, and Andrew Ek for reviewing drafts of this post.

Oban: job processing library for Elixir

2022-02-11T12:27:31-06:00

After working for years on different organizations, one common theme is scheduling background jobs. In this article, I’ll share my experience with Oban, an open-source job processing package for Elixir. I’ll also cover some features, like real-time monitoring with Oban Web and complex workflow management with Oban Pro.

This article is the first of a series; later we will explore other important topics such as:

Oban: Testing your Workers and Configuration
Oban in production

Stay tuned!

Let’s begin with an overview of Oban.

Oban Overview

In this context, when we talk about Oban, I’m not referring to the town in Scotland. Instead, I will be talking about an Elixir library that offers a background job system built on top of PostgreSQL with the primary goals of reliability, consistency, and observability. One of the cool features of Oban, given that it’s built on top of PostgreSQL is that you can enqueue jobs and other database changes, ensuring that everything is committed or rolled back atomically.

I will also talk about Oban Pro, which, according to the official site:

Oban Pro is a collection of plugins, workers and extensions that improve Oban’s reliability and make difficult workflows possible.

You can see a comparison between the open-source and the paid version here.

Let’s start by examining the following configuration:

config :my_app, Oban,
  repo: MyApp.Repo,
  queues: [default: 5, uno: 3],
  engine: Oban.Pro.Queue.SmartEngine,
  plugins: ...

The repo option specifies the Ecto repository used to insert and retrieve jobs. The queues option is always, except when you use false as a value, a keyword list where the keys are the queue names, and its value specifies the concurrency limit. For our specific configuration, we define two queues, default with a local concurrency limit of 5 and the uno queue, which I will use later to explain how to schedule one-off jobs, the local concurrency limit for this queue is 3.

We can extend Oban functionality via plugins and callback modules as I already mentioned. Oban Pro takes advantage of this feature providing a collection of plugins, workers, and extensions.

One extension offered by Oban Pro is the Oban.Pro.Queue.SmartEngine. It is an alternate queue engine that enables true global concurrency and global rate limiting. The open-source package offers a limited basic engine, Oban.Queue.BasicEngine.

The following list of plugins is for an advanced configuration, and it uses plugins from both Oban and Oban Pro. Please adjust the following based on your scenarios:

  plugins: [
    Oban.Plugins.Gossip,
    Oban.Plugins.Stager,
    Oban.Pro.Plugins.BatchManager,
    Oban.Pro.Plugins.Lifeline,
    Oban.Pro.Plugins.Reprioritizer,
    {
      Oban.Pro.Plugins.DynamicPruner,
      mode: {:max_age, {1, :day}},
      limit: 25_000,
      queue_overrides: [
        uno: {:max_age, :infinity}
      ],
      state_overrides: [
        cancelled: {:max_age, {5, :days}},
        discarded: {:max_age, {5, :days}}
      ]
    },
    {
      Oban.Pro.Plugins.DynamicCron,
      crontab: [
        {"30 7 * * *", MyApp.MyWorker},
        {"@reboot", MyApp.Migrations.MyDataMigrationWorker}
      ]
    }
  ]

You can see that we’re using some plugins, like Oban.Pro.Plugins.Lifeline, which offers a way to rescue orphaned jobs. Or the Oban.Pro.Plugins.Reprioritizer, which prevents queue starvation by automatically adjusting priorities to ensure all jobs are eventually processed; this plugin is handy when you’re using different priorities in your Oban.Worker, a classic example is given in the plugin documentation:

For example, a queue that processes jobs from various customers may prioritize customers that are in a higher tier or plan. All high priority (0) jobs are guaranteed to run before any with lower priority (1..3), which is wonderful for the higher tier customers but can lead to resource starvation. When there is a constant flow of high priority jobs the lower priority jobs will never get the chance to run.

The Reprioritizer plugin automatically adjusts lower job’s priorities so that all jobs are eventually processed.

In another section of this article, I will be giving more details about the Oban.Pro.Plugins.DynamicPruner plugin, so let’s skip that for now.

At the end of our plugin list, you can see that we’re using Oban.Pro.Plugins.DynamicCron, which is an advanced version of the Oban.Plugins.Cron plugin for cron scheduling. The pro version allows changing its configuration globally across your entire cluster at runtime.

For more details about Oban, visit the official site at getoban.pro or on HexDocs at hexdocs.pm/oban

Now let’s review some of the features from Oban Web.

Oban Web

The best place to check what Oban Web has to offer is the live Web dashboard demo, according to the authors:

[The Oban Web Dashboard Demo is] a playful combination of randomly generated workers using fake data and random failures makes the demo a chaotic simulation of a production workload. … The demo is a beautiful canary because it uses the latest OSS, Web, and Pro releases, utilizing all the plugins and most available features. With error monitoring, we receive notifications that help us diagnose and fix issues from a constantly running production instance, often (but not always) before any customers report a problem! It’s crowdsourcing and dogfooding rolled into one,

Let’s start reviewing a few parts from this live demo.

Dashboard

In this image, you can see the list of jobs that Oban is executing. On the sidebar, you see different sections, such as nodes, states, and queues.

Currently, they have two nodes acting as workers to run Oban Jobs in this demo.

Each node has six queues, one of them is analysis, and for this queue on each node, we have a local limit of 20, giving us a total of 40, which is what you see in the limit column. You can see that the mailers and the media queues have some symbols in the mode column. The chart line down character in the queues media and mailers means that those queues are rate limited, which is possible given that this demo uses the SmartEngine. Still, you can see another icon for the media queue, the globe icon, meaning that the media queue has a global limit in the cluster.

You can also see that the demo has already completed more than 359k jobs, and there is just one job available. One thing to notice here is that regardless of the number of jobs available, Oban will not process more jobs than is allowed on each queue at a given time, imposing a back-pressure mechanism, which is essential to avoid overloading our system.

Job details

In this image, you can see the job details in real-time; for example, in this capture, you can see the current state, the specific arguments for this job, which node is executing this job, schedule time, and so on. Note that you have a button on the upper right side that could allow you to cancel the job that’s being executed.

In this image, you can see that a specific job is completed without errors.

But in this image, you can appreciate that this job was discarded, meaning that we reached the maximum number of attempts, but on each try, we got errors, this traceback view increases the observability and could help you find an issue on your code, or, if you’re dealing with a flaky third-party service, you can hit the Retry button, for example.

Queues

If you press the Queues tab, you can see more details per queue, including nodes, how many jobs per queue are available, their local and global limit, if they are rate-limited, and so on. If you have the proper permissions, you can even stop or resume each queue on-demand from here. For example, in the previous screenshot, you can see that I stopped the analysis queue in one of the available nodes.

Smart Engine extension

Under the Queues tab, you will see an image similar to the previous one if you click on a queue name. Using the SmartEngine, you can set limits by rate, global, or both per queue.

One neat feature of the rate limit section is that you can go a bit further and apply partitioned rate-limiting by worker, args, or both within a queue. For example, in the media queue, you can see that there is a local limit of 10, a global limit of 25, but only 20 jobs are allowed per worker (Partition Field), every 60 seconds, across every instance of the media queue in this cluster.

Here you can see that the mailers queue is rate limited. We can define Oban Jobs or Workers that interact with external services, and we just set the rate limit at the queue level. There is no need to worry about rate limit implementation at the worker level.

Okay, I covered enough Oban Pro and Oban Web features with this section. But, if you want to know more details about the Oban Architecture, I recommend checking these slides The Architecture of Oban from Parker Selbert.

Now, let’s review some conventions that I tend to follow.

Conventions

I will briefly examine some conventions I like to follow when using Oban in the following sub-sections.

Naming and file/directory organization

I tend to follow this code organization for Oban workers; adding subdirectories under the workers directory is valid.

my_app/
├── README.md
├── lib
│   ├── my_app
│   │   └── workers
│   │       └── archive_account.ex
│   └── my_app.ex
├── mix.exs
└── test
    ├── my_app
    │   └── workers
    │       └── archive_account_test.ex
    ├── my_app_test.exs
    └── test_helper.exs

And the module naming is as follows:

MyApp.Workers.ArchiveAccount for the worker implementation
MyApp.Workers.ArchiveAccountTest for the unit tests associated with the previous worker implementation.

I’ve worked on some projects that follow the Phoenix style, meaning that the directory structure is very similar to the previous one, but the file names are a bit different:

my_app/
├── README.md
├── lib
│   ├── my_app
│   │   └── workers
│   │       └── archive_account_worker.ex
│   └── my_app.ex
├── mix.exs
└── test
    ├── my_app
    │   └── workers
    │       └── archive_account_worker_test.ex
    ├── my_app_test.exs
    └── test_helper.exs

Also, the module names are a bit different:

MyApp.ArchiveAccountWorker for the worker implementation
MyApp.ArchiveAccountWorkerTest for the unit tests associated with the previous worker implementation.

I’ve worked fine with both approaches; once your team has made the decision, you should stick with it; that’s the most important thing for me to be honest.

So, before starting coding, take some time and define what code organization and naming convention you and your team want to follow.

I will follow the “Phoenix” way of doing things in the following code samples.

Keep calls to `Oban.insert` or `Oban.insert_all` contained in your worker

I highly recommend keeping the knowledge about how to enqueue an Oban job in their Oban.Worker implementation. Following this approach, you also avoid polluting your controllers, resolvers, or contexts with a sequence of calls like the following:

my_job_args
|> MyApp.MyWorker.new()
|> Oban.insert()

Instead, you can create a enqueue/1 function like this:

defmodule MyApp.MyWorker do
  use Oban.Worker,
    queue: :things,
    max_attempts: 5,
    unique: [period: _period_in_seconds = round(:timer.hours(1) / 1000)]

  alias MyApp.Thing

  @doc """
  Enqueues an Oban job to do something with the given thing
  """
  @spec enqueue(Thing.t()) :: {:ok, Job.t()} | {:error, Job.changeset()} | {:error, term()}
  def enqueue(%Thing{id: thing_id}) do
    %{thing_id: thing_id}
    |> new()
    |> Oban.insert()
  end

  @impl Oban.Worker
  def perform(%Job{args: %{"thing_id" => _thing_id}} = _job) do
    :ok
  end
end

Remember that your enqueue function doesn’t need to have an arity of one; adjust the number of arguments depending on what your worker expects.

Even for more complex workers, you can apply the same convention, for example:

defmodule MyApp.TranscodeWorker do
  use Oban.Pro.Workers.Workflow

  alias MyApp.IndexingWorker
  alias MyApp.NotifyWorker
  alias MyApp.RecognizeWorker
  alias MyApp.SentimentWorker
  alias MyApp.TopicsWorker
  alias MyApp.TranscribeWorker

  def process_video(video_id) do
    args = %{id: video_id}

    new_workflow()
    |> add(:transcode, new(args))
    |> add(:transcribe, TranscribeWorker.new(args), deps: [:transcode])
    |> add(:indexing, IndexingWorker.new(args), deps: [:transcode])
    |> add(:recognize, RecognizeWorker.new(args), deps: [:transcode])
    |> add(:sentiment, SentimentWorker.new(args), deps: [:transcribe])
    |> add(:topics, TopicsWorker.new(args), deps: [:transcribe])
    |> add(:notify, NotifyWorker.new(args), deps: [:indexing, :recognize, :sentiment])
    |> Oban.insert_all()
  end

  # ...
end

NOTE: The previous example was borrowed and slightly modified from the _Workflow Example_ available in the Composing Jobs With Oban Pro post by Shannon and Parker.

One-off jobs

The easiest way to run one-off functions is via bin/RELEASE_NAME remote (or remote_console if you use distillery to create your release) on production nodes, but that’s not always available. In these cases, you can use Oban to run your one-off jobs.

If you plan to do a data migration, for example, consider wrapping this process into an Oban job doing the following.

Add a new file under lib/my_app/workers/migrations/, leaving the suffix _worker.ex, for example: lib/my_app/workers/migrations/my_data_migration_worker.ex, your new worker module should be something similar to the following:

defmodule MyApp.Migrations.MyDataMigrationWorker do
  use Oban.Worker,
    queue: :uno,
    max_attempts: 5,
    unique: [period: :infinity, states: Oban.Job.states()]

  @impl Oban.Worker
  def perform(%Job{} = job) do
    # TODO: data migration
    # should return a valid value
    # See: https://hexdocs.pm/oban/Oban.Worker.html#module-defining-workers
  end
end

From the previous code snippet, notice that you must use the uno queue and define a unique: [period: :infinity, states: Oban.Job.states()] to indicate that any attempt to enqueue a subsequent job will be considered a duplicate as long as jobs are retained in the database, and for the specific case of the uno queue, we keep those jobs indefinitely. You can adjust the number of max_attempts based on your scenario.

You can test your data migration adding a new file under test/my_app/workers/migrations/my_data_migration_worker_test.exs.

Once you have unit tested your worker following the suggestions that I will offer in the Testing your Workers and Configuration article, proceed to add an entry in your configuration file:

config :my_app, Oban,
  # ..
  plugins: [
    # ...
    {
      Oban.Pro.Plugins.DynamicCron,
      crontab: [
        # ...
        {"@reboot", MyApp.Migrations.MyDataMigrationWorker},
        # ...
      ]
    }
  ]

The @reboot string is a “non-standard syntax” that allows executing the given job at boot time in one single node in the cluster.

Challenges

Now, I think it’s time to examine a few challenges that you could find while working with Oban.

Inserting Oban jobs in bulk

Sometimes you need to enqueue many Oban Jobs at once or in bulk.

Please, don’t do this:

my_data_stream
|> Stream.map(&MyApp.MyWorker.new(%{asset_id: &1.id}) # <- use the arguments you really need for your worker
|> Enum.map(&Oban.insert/1)

This will produce a lot of roundtrips to the database, instead, you should use Oban.insert_all/4:

my_data_stream
|> Enum.map(&MyApp.MyWorker.new(%{asset_id: &1.id}))
|> Oban.insert_all()

While the previous approach avoids doing many roundtrips to the database, you can have problems depending on the number of jobs you’re trying to insert at once. Keep in mind that PostgreSQL’s binary protocol has a limit of 65,535 parameters that you may send in a single call. That presents an upper limit on the number of rows you may insert at one time and, therefore, the number of jobs you may insert in all at once.

So, it’s safer to split the previous stream into chunks:

timeouts = 
  my_data_stream
  |> Stream.map(&MyApp.MyWorker.new(%{asset_id: &1.id})
  |> Stream.chunk_every(chunk_size)
  |> Task.async_stream(&Oban.insert_all/1, ordered: false, timeout: timeout_ms, on_timeout: :kill_task)
  |> Stream.filter(& &1 == {:exit, :timeout})
  |> Enum.count()

# TODO: handle timeouts 

Here we’re using our beloved Task.async_stream/3, that will return a stream that runs the given function, Oban.insert_all/1, concurrently over each chunk in the enumerable.

Adjust the chunk_size accordingly to handle your specific scenarios.

One important thing to keep in mind when you use Oban.insert_all/2,4 is that you can insert duplicate jobs, even if your worker defines unique options like:

defmodule MyApp.MyWorker do
  use Oban.Worker,
    unique: [period: _period_in_seconds = round(:timer.hours(1) / 1000)]

  # ...
end

As noted in the documentation:

[Oban.insert_all/2] insertion respects prefix and log settings, but it does not use per-job unique configuration. You must use insert/2,4 or insert!/2 for per-job unique support.

But, sometimes, using Oban.insert/2,4 is too costly. You might want to insert hundreds or thousands of unique jobs as fast as possible; in these cases, you have two possibilities, at least that I’m aware of so far.

The first one is to guarantee the uniqueness in the stream pipeline, but, this could be risky because you are discarding the possibility of introducing a duplicate job that’s already in the oban_jobs table. In these cases, it’s safer to set up a partial unique index in PostgreSQL, you can set up a migration to create your partial unique index as follows:

defmodule MyApp.Repo.Migrations.CreateUniqueIndexForMyWorker do
   use Ecto.Migration

   @disable_ddl_transaction true
   @disable_migration_lock true

   @index_name "oban_jobs_unique_my_worker_index"
   @worker_name "MyApp.MyWorker"

   def up do
     execute("""
     CREATE UNIQUE INDEX CONCURRENTLY #{@index_name} ON oban_jobs (worker, args) WHERE worker = '#{@worker_name}'
     """)
   end

   def down do
     execute("DROP INDEX IF EXISTS #{@index_name}")
   end
 end

You can include more conditions to your partial unique index, adjust this settings based on your specific case.

But you may be wondering why this would work at all. It happens that Oban.insert_all/2,4 it’s a wrapper around Repo.insert_all/4 and it sets the on_conflict option to :nothing. Better, you can test this behavior, once you have run the previous migration yourself by creating s unit test similar to this one:

refute payload
       |> MyBatchWorker.new_batch(batch_id: project.id)
       |> Oban.insert_all()
       |> Enum.any?(& &1.conflict?)

# second time all the jobs must create a conflict, but not an insertion
assert payload
       |> MyBatchWorker.new_batch(batch_id: project.id)
       |> Oban.insert_all()
       |> Enum.all?(& &1.conflict?)

In this unit test, we use the key :conflict? to detect the job uniqueness. From the Oban README, we have:

When unique settings match an existing job, the return value of Oban.insert/2 is still {:ok, job}. However, you can detect a unique conflict by checking the jobs’ :conflict? field. If there was an existing job, the field is true; otherwise it is false.

So, in the first pass, we check that the Oban Jobs doesn’t have any entry like: %Oban.Job{conflict?: true}. We check that all the entries have %Oban.Job{conflict?: true} in the second pass.

Nice, huh?

Complex workers

I won’t explain in detail the Workers offered by Oban.Pro, not because I think they don’t deserve a space here, but because Shannon and Parker already did a fantastic job in their post Composing Jobs With Oban Pro, they will take you on a tour of the workers included in Pro and explore some real-world use-cases where each one shines.

Limitations

If you have reached this point, you undoubtedly noticed that I have enjoyed working with Oban so far, and at this point, I could be biased, so I want to take a step back to add some balance and mention some of Oban’s limitations.

Oban is “job processing in Elixir, backed by modern PostgreSQL”. I don’t see this as a limitation, but you can’t use Oban if you don’t use PostgreSQL in your stack. Thankfully that’s not my case for many years :)
If you use PostgreSQL, but you have an overloaded database currently, I don’t recommend adding more pressure with Oban. You have a bigger problem that you need to solve as soon as possible—you need to figure out what’s overloading your database and fix it!
If you still want to use Oban, but you’re thinking of adding a new PostgreSQL database just for the Oban workers, keep in mind that you will lose one of the most relevant features from Oban, which is the built-in transactional control.
If you want good performance in Oban Web, you should keep your oban_jobs table lean. I’ve noted some delays in the UI at around ten million records.
If you’re trying to ingest high throughput event streams, Oban probably isn’t the solution you need. You can process more than 15K jobs per second with Oban on a single node; of course, that number will highly depend on what your worker does. But you should increase that throughput significantly if you use the Batch Worker behaviour. Still, sometimes that’s not enough, and you should use a message broker, like rabbitmq in conjunction with Broadway or GenStage. Still, the initial learning curve of the latter tools could be higher, and you also need to make other considerations, like considering the state of your processes while you deploy.

Wishlist

In the same vein as the previous section, there are a few things that I would like to see in Oban:

Autoload regulation, suppose that your deployment in production shares the API server with your Oban Workers; these processes will compete with each other for resources. With a regulation framework built-in, your queues could be constrained a bit more if the load in your nodes is too high. If you don’t have a rate limit in your queue, you could automatically increase the concurrency limit if the load in the node is low. An excellent reference to this topic can be found in the paper Generic Load Regulation Framework for Erlang by Ulf Wiger. As far as I can tell, this feature is already in the roadmap.
A callback in the {Fixed,Dynamic}Pruner plugins; that way, before proceeding with the deletion, you can store or transfer those records into cold storage or somewhere else. I recently opened an issue about this feature.

Do you have things that you would like to see in Oban? If that’s the case, and you want to share those wishes with me, you can reach me at @milmazz on Twitter, or you can find me in the #oban channel on the Elixir Slack.

Community

You can connect with the Oban authors, contributors, and other Oban users through any of these channels:

#oban channel on the Elixir Slack. Here you can see new announcements about new Oban releases.
Follow @sorentwo on Twitter for tips, announcements, and news about Oban
If you want to contribute to the OSS project, feel free to do it at https://github.com/sorentwo/oban. You might start with the issues list. I’ve been able to contribute more than a dozen times, but on each occasion that I had a doubt, the authors were welcoming and offered good references, which is my general experience in the Elixir community, so don’t be shy and join the team :)

Conclusion

Oban is a solid solution to handle your background jobs in Elixir. It keeps true to what they offer in the README, like fewer dependencies, isolated queues, transactional control, unique, scheduled, and recurrent jobs, telemetry integration, and much more.

After all this, I think it’s clear that if you or your company need a background job system in Elixir, I recommend buying the Oban Web+Pro license, keep in mind that when you buy the license, you’re helping Shannon and Parker, to keep investing their time building more features for the open-source release of Oban.

If you have any pattern that you follow when you use Oban and you want to share that with me, you can reach me at @milmazz on Twitter, or you can find me in the #oban channel on the Elixir Slack.

That’s all folks! Thanks for reading.

Acknowledgments

Thank you Parker Selbert for reviewing drafts of this post.

Improve the codebase of an acquired product

2020-04-01T13:27:31-05:00

In this article I’ll share my experience improving the codebase of an acquired product, this couldn’t be possible without the help of a fantastic team. Before diving into the initial diagnostic and strategies that we took to tackle technical debt, I’ll share some background around the acquisition. Let’s start.

Background

I wasn’t involved in the acquisition process. I started working with this client around a year after that decision; that’s why in this article, I do not mention those details. Maybe the only thing that I can share is that Elixir and Phoenix power the product, and possibly at the time of the acquisition, fulfilled its goal and covered some opportunity costs for the client.

I’ll assume that the previous developers made their best effort under a set of constraints (time, team-size, among others) that I don’t know how good or bad they were. This codebase is now part of the team, and it’s our team goal to take ownership of this codebase, to accomplish that we establish some common goals:

Do not break current working features given that the system is in production, and it’s serving millions of requests per day
Add new features based on requirements or stories, the usual thing
Improve the codebase along the way

After the acquisition process, the client knew that the codebase was a “dumpster fire” (their words, not mine). Still, it was generating revenue, so throwing away that product was out of the question. Thankfully, the client allowed some time to improve the codebase.

At the same time, we delivered new features, even accepted the fact that we had to delay some deployments until we thought it was the right time to do it. Again, I mention this because this kind of agreement is difficult to get sometimes.

Diagnostics

The repository followed a poncho structure; the state of some of the applications were the following:

Zero (none, nada) unit tests, this was awful because it made the whole refactoring process more painful and slow.
No documentation in the modules, public functions, only a brief README file describing the goal of the application, and that was all.
A lot of 3rd party dependencies, the most painful part here was the dependency of different storage services. For example, Airtable, Mongo, PostgreSQL.
The goals of some project dependencies were the same, like HTTP clients, JSON parsers, job queue processors, among others. Even a team-partner said something like: “…background worker libraries are like Pokemon, and I think we have collected them all”, so that could give you an idea of our situation.
We found large portions of duplicate implementations, for example, HTTP clients implementations embedded in some of the poncho applications instead of having that HTTP client implementation as a dependency.

Strategies

The team decided to tackle these improvements in a series of continuous steps.

Move the other “storage services” into PostgreSQL one table at a time. The current team has a better knowledge of this ORDBMS, and it’s been offering excellent results.
Delete unused code; this was a low-hanging-fruit for us using some tools that I’ll mention later in this article.
Separate the implementation of 3rd party HTTP clients from the main components. So, reaching a given 3rd party service must be done via a separate application.
Improve some design decisions
Introduce error monitoring, that allowed us to discover, triage, and prioritize errors in real-time.

Tooling support

As I mentioned before, removing unreachable or unused code was easy for us, and with that, we removed a considerable burden when trying to understand how the applications work. Here the critical player in my workflow was unused, which is a command-line tool, written in Haskell, that helps you to identify unused code in multiple languages and that includes Elixir.

Before we proceeded with the code removal, we were confident that the code we were removing wouldn’t break the program, one way we used to gain more confidence was mix xref callers CALLEE, which prints all the callers of the given CALLEE¹.

Then, we started adding unit tests to each project. Unit testing is a first-class concern on our team, so, before proceeding with a storage service migration (e.g., Moving a Mongo collection into a PostgreSQL table), we added unit tests, using the ex_unit framework, around the modules and functions that we were planning to migrate. While doing this, we did some test coverage analysis with mix test --cover to identify untested areas of the code.

Adding unit tests around the areas we were planning to migrate increased our confidence in the following steps, and also helped us to discover other bugs along the way. So, that was a win-win situation for us.

Another great addition was the error monitoring process, before doing that we were almost blind about the things that were failing in production, well, that’s not entirely accurate, we had access to the logs, but nobody in the team was checking those constantly.

The team was already using Sentry in other applications, so we decided to use Sentry here too. We started linking Sentry with our GitHub repository. We proceeded to tackle one bug at a time in multiple applications, giving priority to the events with a higher number of occurrences. To provide you with an idea of the situation at the beginning, some of the applications had so many errors that Sentry responded with HTTP Errors 429 (Too Many Requests), that’s why some team-members coined the term “dumpster fire” to this legacy application. Today the situation is another story, so I recommend adding an error monitoring tool as soon as possible in your projects, and this includes your staging environments too.

We used other tools too, like dialyzer, it’s true, dialyzer can be intricate sometimes. Still, I would say that it could be beneficial to discover bugs in your code and also incorrect types and specifications (a.k.a. typespecs).

Last but not least, credo has been useful to run static code analysis in our codebase and to promote some consistency and readability.

While we haven’t included ex_doc in our workflow yet, we have added a bunch of documentation in our code, and it’s already paying dividends; newer team members mentioned that the onboarding process is more accessible thanks to that documentation.

Design

While we were refactoring, we made many decisions to improve the current codebase, among them, one of the most recent was that we identified that the company was losing money when an agent/user started a session (LOGON) in a 3rd party system. Still, the agent wasn’t READY to operate. The interaction with the 3rd party service was something like this:

Every second a process collected data from the 3rd party service because they didn’t offer a webhook that allowed us to subscribe to those events
Then, the producer sent all the collected messages to some other modules via Phoenix Channels, among the one that was in charge of forcing those agents to be READY to operate
The process that put READY those agents was only one instance of a GenServer

So, given the previous scenario and given that a GenServer can process only one message at a time, we had some agents in the NOT_READY state waiting for the FORCE_READY operation. What we did, in this case, was to split the input per agent IDs and then dynamically create GenServers where its state was all around the agent/user, once the agent finished its session, we stopped that GenServer instance. This change allowed us to increase our FORCE_READY operations thanks to the concurrency and prepared us for a new feature, which kicked users after a configurable time of inactivity.

Previously the kicking process was done by our support staff manually. We added the “kicking inactive users” feature while we were doing a work re-retreat (the company is 100% remote). You can’t imagine the happiness shown by our support staff after deploying that feature and seeing how the application was automatically kicking inactive users. Sometimes, as a developer, we lose the perspective and impact that we can make on people’s lives; it was a humbling experience.

Rewards

Staff & support teams are enjoying the fruits of these improvements while doing less manual processes. Of course, we still have some areas that we need to polish. We’re getting there
Improved the onboarding experience for newer developers, while also improving the experience for older developers
Our confidence while doing changes have increased
Deliver new features more quickly
Savings, while we haven’t completed the whole migration from some storage services, our estimate is to save the company more than 20K a month after finishing the whole process, which, according to our plan, is at the end of the next month.

Wrapping Up

Elixir and Phoenix, even in the worst-case scenarios, have demonstrated that it allows some companies to deliver excellent results. But, it’s the team culture, and following effective practices, that enables to improve the user experience, while enhancing the performance and the quality of the code.

Another tool that you should probably check out is Mix Unused. I haven’t used it yet, because I recently knew about its existence, but I plan to do it sooner than later. ↩

Elixir’s MIME library review

2018-11-23T13:40:31-06:00

Elixir’s MIME is a read-only and immutable library that embeds the MIME type database, so, users can map MIME (Multipurpose Internet Mail Extensions) types to extensions and vice-versa. It’s a really compact project and includes nice features, which I’ll try to explain in case you’re not familiar with the library. Then, I’ll focus on MIME’s internals or how was built, and also how MIME illustrates in an elegant way so many features of Elixir itself. One of the goals, maybe the main one, of this library is to offer a performant lookup of the MIME database at runtime, that’s why new MIME types can only be added at compile-time via configuration, but we’ll talk about this option later. First, let’s review its public API.

API

MIME offers a short set of functions, which cover the most relevant cases when you work with MIME types.

Let’s review real quick the MIME library API, most of the examples were taken from the MIME’s documentation page.

`extensions(String.t()) :: [String.t()]`

Returns the extensions associated with the given MIME type.

iex> MIME.extensions("application/json")
["json"]
iex> MIME.extensions("foo/bar")
[]

`type(String.t()) :: String.t()`

Returns the MIME type related to the given file extension.

iex> MIME.type("txt")
"text/plain"

`from_path(Path.t()) :: String.t()`

Guesses the MIME type based on the path’s extension.

iex> MIME.from_path("index.html")
"text/html"

`has_type?(String.t()) :: boolean`

Returns whether an extension has a MIME type associated.

iex> MIME.has_type?("txt")
true
iex> MIME.has_type?("foobarbaz")
false

`valid?(String.t()) :: boolean`

Returns whether a MIME type is registered.

iex> MIME.valid?("text/plain")
true
iex> MIME.valid?("foo/bar")
false

Who is using MIME library?

At the time of this writing, and according to the statistics available from the Hex package manager, the MIME library has 21 dependents projects, among those projects you can find: Plug, Phoenix, Tesla, Swoosh, etc., and have been downloaded almost 6 million times. But more importantly, at least to me, is how the MIME library is implemented, its code is really concise, it’s around 200 SLOC (Source Lines Of Code) including comments, and embed captivating concepts.

How was the MIME library built?

Now, let’s start looking into how the MIME library was built.

Inside of the MIME.Application.quoted/1 function you can find the following section:

# file: lib/mime/application.ex
mime_file = Application.app_dir(:mime, "priv/mime.types")
@compile :no_native
@external_resource mime_file
stream = File.stream!(mime_file)

mapping =
  for line <- stream,
      not String.starts_with?(line, ["#", "\n"]),
      [type | exts] = line |> String.trim() |> String.split(),
      exts != [],
      do: {type, exts}

You can notice that the MIME library transforms the data located in the priv/mime.types file, which is a copy of the IANA (Internet Assigned Numbers Authority) database in text format and describes what Internet media types are sent to the client for the given file extension(s). Keep in mind that sending the correct media type to the client is important so they know how to handle the content of the file.

Here is an example of the priv/mime.types file content:

# file: priv/mime.types
# IANA types

# MIME type         Extensions
application/3gpp-ims+xml
application/ATXML       atxml
application/atom+xml        atom
application/atomcat+xml       atomcat
application/octet-stream    bin lha lzh exe class so dll img iso
application/pdf         pdf

To do the transformation, the MIME library reads the file line by line (via File.stream!/3) at compile time, and ignores empty lines or lines that start with a comment (#). After that, it removes the leading and trailing whitespace and then splits that line or string into a list of substrings, the head of this list represents the mime type and the tail of the list represents the extensions, that’s why at the end of the for comprehension you can see an extra filter to ignore mime types that does not have any extensions associated (e.g. application/3gpp-ims+xml), this is an optimization that reduces the compilation time. Finally, it creates a list of {type, extensions} tuples. The result of this transformation is stored in a binding called mapping.

Is important to note the usage of two Module attributes, the first one is @external_resource, which as its name implies, specifies an external resource for the given module, this attribute is used for tools like Mix to know if the current module needs to be recompiled in the case that any external resource is updated. Lastly, the @compile attribute defines options for the module compilation, this is used to configure both Elixir and Erlang compilers.

Once the MIME library has transformed the data and stored the result in mapping, it creates two private helper functions:

The first private helper function is ext_to_mime/1, which returns the MIME type given an extension:

@spec ext_to_mime(String.t()) :: String.t() | nil
defp ext_to_mime(type)

for {type, exts} <- mapping,
    ext <- exts do
  defp ext_to_mime(unquote(ext)), do: unquote(type)
end

defp ext_to_mime(_ext), do: nil

The MIME library creates thousands of ext_to_mime/1 function clauses inside of the for comprehension, this is a clear example of the power of meta-programming and how the MIME library relies on pattern matching to be performant. And this is possible because of Elixir quote and unquote mechanisms provide a feature called unquote fragments, that way is easy to create function on-the-fly at compile time.

To give you a better idea, here is a section of the final result:

defp ext_to_mime("atom"), do: "application/atom+xml"
defp ext_to_mime("pdf"), do: "application/pdf"
defp ext_to_mime("dll"), do: "application/octet-stream"
defp ext_to_mime("class"), do: "application/octet-stream"
# ...
defp ext_to_mime(_ext), do: nil

To complete the function declaration you see a catch-all function clause, which will be used if any match is not found.

The second private helper function declaration is mime_to_ext/1, this function expects a MIME type and will return a list of extensions or nil.

@spec mime_to_ext(String.t()) :: list(String.t()) | nil
defp mime_to_ext(type)

for {type, exts} <- mapping do
  defp mime_to_ext(unquote(type)), do: unquote(exts)
end

defp mime_to_ext(_type), do: nil

The result of the transformation should be similar to:

defp mime_to_ext("application/atom+xml"), do: ["atom"]
defp mime_to_ext("application/octet-stream"),
  do:  ["bin", "lha", "lzh", "exe", "class", "so", "dll", "img", "iso"]
defp mime_to_ext("application/pdf"), do: ["pdf"]
# ...
defp mime_to_ext(_type), do: nil

From here, is easy to build the public functions:

@spec valid?(String.t()) :: boolean
def valid?(type) do
  is_list(mime_to_ext(type))
end

def extensions(type) do
  mime_to_ext(type) || []
end

@default_type "application/octet-stream"

@spec type(String.t()) :: String.t()
def type(file_extension) do
  ext_to_mime(file_extension) || @default_type
end

def has_type?(file_extension) do
  is_binary(ext_to_mime(file_extension))
end

def from_path(path) do
  case Path.extname(path) do
    "." <> ext -> type(downcase(ext, ""))
    _ -> @default_type
  end
end

defp downcase(<<h, t::binary>>, acc) when h in ?A..?Z,
  do: downcase(t, <<acc::binary, h + 32>>)

defp downcase(<<h, t::binary>>, acc), do: downcase(t, <<acc::binary, h>>)
defp downcase(<<>>, acc), do: acc

That’s it!, at least with these functions, the MIME library cover the main features. But wait, there is more, do you remember that at the beginning I mentioned the following:

One of the goals, maybe the main one, of this library is to be provide a performant lookup of the MIME database at runtime, that’s why new MIME types can only be added at compile-time via configuration, but we’ll talk about this option later…

So, this means that we can add a MIME type like application/wasm for the extension: wasm, which have been added to the provisional standard media type registry but is not official yet.

Via configuration you can do the following:

# file: config/config.exs
use Mix.Config

config :mime, :types, %{"application/wasm" => ["wasm"]}

Then, MIME needs to be recompiled, using Mix you can do the following:

mix deps.clean mime --build
mix deps.get

You can test the result via IEx:

iex> MIME.type("wasm")
"application/wasm"
iex> MIME.extensions("application/wasm")
["wasm"]

Now you may be wondering, how does it work? and that’s an excellent question, let’s try to find an answer to that.

In the previous function declarations of ext_to_mime/1 and mime_to_ext/1 I’ve omitted two function clauses on purpose, which are specifically related with the custom types handling, let’s see the whole declaration for those two functions now:

@spec ext_to_mime(String.t()) :: String.t() | nil
defp ext_to_mime(type)

for {type, exts} <- custom_types,
    ext <- List.wrap(exts) do
  defp ext_to_mime(unquote(ext)), do: unquote(type)
end

for {type, exts} <- mapping,
    ext <- exts do
  defp ext_to_mime(unquote(ext)), do: unquote(type)
end

defp ext_to_mime(_ext), do: nil

@spec mime_to_ext(String.t()) :: list(String.t()) | nil
defp mime_to_ext(type)

for {type, exts} <- custom_types do
  defp mime_to_ext(unquote(type)), do: unquote(List.wrap(exts))
end

for {type, exts} <- mapping do
  defp mime_to_ext(unquote(type)), do: unquote(exts)
end

defp mime_to_ext(_type), do: nil

Now you can see that these custom MIME types come first. But wait a minute, where is custom_types binding coming from?, well that’s the first argument of the MIME.Application.quoted/1 function.

There is another function that uses the custom_types binding, and that’s compiled_custom_types/0, which as its name implies, returns the custom types compiled into the MIME library.

def compiled_custom_types do
  unquote(Macro.escape(custom_types))
end

The last thing that I want to mention related to the function MIME.Application.quoted/1 is that this function returns a quoted expression:

def quoted(custom_types) do
  quote bind_quoted: [custom_types: Macro.escape(custom_types)] do
    mime_file = Application.app_dir(:mime, "priv/mime.types")
    @compile :no_native
    # ...
  end
end

You can see here that the bind_quoted option is passing a binding to the macro, please keep in mind that the bind_quoted option is recommended every time you want to inject a value into the quote.

If you execute the function MIME.Application.quoted/1 in a IEx session you will get something like this:

iex> MIME.Application.quoted(%{})
{:__block__, [],
 [
   {:=, [], [{:custom_types, [], MIME.Application}, {:%{}, [], []}]},
   {:__block__, [],
    [
      {:@, [context: MIME.Application, import: Kernel],
       [
         {:moduledoc, [context: MIME.Application],
          # ...

So, our beloved MIME.Application.quoted/1 function is actually returning an Elixir data structure. But, who consumes that data structure? Let’s check the lib/mime.ex file contents:

# file: lib/mime.ex
quoted = MIME.Application.quoted(Application.get_env(:mime, :types, %{}))
Module.create(MIME, quoted, __ENV__)

Believe me, that’s all the content on lib/mime.ex at the moment. In the first line, you can see a call to MIME.Application.quoted/1 passing as argument the custom MIME types defined via configuration or an empty map as a fallback, the result of that invocation is stored in the quoted binding. Then, the second line will create a module with the given name of MIME and it will be defined by the previous quoted expression, keep in mind that the function Module.create/3, compared with Kernel.defmodule/2, is preferred when the module body is a quoted expression and another advantage is that Module.create/3 allow you to control the environment variables used when defining the module.

Automatic recompilation

When we started talking about how to add custom MIME types via configuration, we also mentioned that we need to recompile the library. So, what happens in the case you forget about doing that? Well, your changes will not take effect until the dependency is manually recompiled, and before the release 1.3.0 you didn’t see any warning about it.

Since version 1.3.0 of the MIME library, the recompilation process is automatic if the compile-time database is out of date.

# file: lib/mime/application.ex
defmodule MIME.Application do
  use Application
  require Logger

  def start(_, _) do
    app = Application.fetch_env!(:mime, :types)

    if app != MIME.compiled_custom_types() do
      Logger.error("""
      The :mime library has been compiled with the following custom types:

          #{inspect(MIME.compiled_custom_types())}

      But it is being started with the following types:

          #{inspect(app)}

      We are going to dynamically recompile it during boot,
      but please clean the :mime dependency to make sure it is recompiled:

          $ mix deps.clean mime --build

      """)

      Module.create(MIME, quoted(app), __ENV__)
    end

    Supervisor.start_link([], strategy: :one_for_one)
  end

  def quoted(custom_types) do
  # ...
  end
end

So, what this means is that the MIME library at boot-time, will log an error and will try to dynamically recompile the MIME module if the custom mime types of the user environment are different from the ones returned by MIME.compiled_custom_types/0, which is great! but, as the log messages says, it’s recommended to clean the :mime dependency to make sure it’s recompiled.

Summary

Elixir MIME is a short but powerful library, its goal is clear, and in just around 200 SLOC you can see a lot of nice concepts, like meta-programming, file streams, pattern matching, macros, unquote fragments, dynamic module creation, dynamic recompilation at boot-time, among other really cool stuff.

That’s all folks! Thanks for reading.

Follow-up: Function currying in Elixir

2017-09-19T16:00:00-05:00

NOTE: This article is a follow-up examination after the blog post Function currying in Elixir by @stormpat

In his article, Patrik Storm, shows how to implement function currying in Elixir, which could be really neat in some situations. For those who haven’t read Patrik’s post, first, let us clarify what is function currying.

Currying is the process of transforming a function that takes multiple arguments (arity) into a function that takes only one argument and returns another function if any arguments are still required. When the last required argument is given, the function automatically executes and computes the result. As a first step, let us apply function currying manually:

iex(1)> greet = fn greeting, name -> IO.puts "#{greeting}, #{name}" end
#Function<12.52032458/2 in :erl_eval.expr/5>
iex(2)> greet.("Hello", "John") # uncurried function
Hello, John
:ok
iex(3)> greetCurry = fn greeting -> fn name -> IO.puts "#{greeting}, #{name}" end end
#Function<6.52032458/1 in :erl_eval.expr/5>
iex(4)> greetCurry.("Hello").("John")
Hello, John
:ok

To get a general solution, Patrik uses a nice approach that combines pattern matching and tail-call optimization, let’s dive into his implementation:

# file: curry.exs
defmodule Curry do
  def curry(fun) do
    {_, arity} = :erlang.fun_info(fun, :arity)
    curry(fun, arity, [])
  end

  def curry(fun, 0, arguments) do
    apply(fun, Enum.reverse arguments)
  end

  def curry(fun, arity, arguments) do
    fn arg -> curry(fun, arity - 1, [arg | arguments]) end
  end
end

The main points in this Curry module are the following:

Curry.curry/1 represents our entry point, this function use :erlang.func_info/2 to know the arity (number of arguments) of the given function fun. Then, we pass the control to the function Curry.curry/3
The recursive function Curry.curry/3 will return anonymous functions that only takes just one argument.
When the last required argument is given we will use Kernel.apply/2 to invoke the given function fun with the list of arguments args.

Let’s show how we can use function currying, I’ll use the same examples that Patrik did in his post but using ExUnit instead:

# file: curried.exs
defmodule Curried do
  import Curry

  def match term do
    curry(fn what -> (Regex.match?(term, what)) end)
  end

  def filter f do
    curry(fn list -> Enum.filter(list, f) end)
  end

  def replace what do
    curry(fn replacement, word ->
      Regex.replace(what, word, replacement)
    end)
  end
end

Our unit tests:

# file curry_test.exs
ExUnit.start()

Code.require_file("curry.exs", __DIR__)
Code.require_file("curried.exs", __DIR__)

defmodule CurryTest do
  use ExUnit.Case

  test "applying all the params at once or one step at a time should produce same results" do
    curried = Curry.curry(fn a, b, c, d -> a * b + div(c, d) end)
    five_squared = curried.(5).(5)

    assert five_squared.(10).(2) == curried.(5).(5).(10).(2)
  end

  test "curry allow to create composable functions" do
    has_spaces = Curried.match(~r/\s+/)
    sentences = Curried.filter(has_spaces)
    disallowed = Curried.replace(~r/[jruesbtni]/)
    censored = disallowed.("*")

    allowed = sentences.(["justin bibier", "and sentences", "are", "allowed"])

    assert "****** ******" == allowed |> List.first() |> censored.()
  end
end

Now we can run our tests as follows:

$ elixir curry_test.exs
..

Finished in 0.2 seconds (0.2s on load, 0.00s on tests)
2 tests, 0 failures

Randomized with seed 604000

It is working, but I feel we can improve a few things, in this case, our curry function only takes into account that the arguments are given from left to right. What about if we want to give the parameters from right to left? Let’s introduce curryRight:

# file: curry.exs
defmodule Curry do
  def curry(fun) when is_function(fun), do: curry(fun, :left)

  def curryRight(fun) when is_function(fun), do: curry(fun, :right)

  defp curry(fun, direction) do
    {_, arity} = :erlang.fun_info(fun, :arity)
    curry(fun, arity, [], direction)
  end

  defp curry(fun, 0, args, :left) do
    apply(fun, Enum.reverse(args))
  end

  defp curry(fun, 0, args, :right) do
    apply(fun, args)
  end

  defp curry(fun, arity, args, direction) do
    &curry(fun, arity - 1, [&1 | args], direction)
  end
end

Then, our Curried module, which holds support functions, is much simpler if we do the following:

# file: curried.exs
defmodule Curried do
  import Curry

  def match(term), do: curry(&Regex.match?/2).(term)

  def filter(f), do: curryRight(&Enum.filter/2).(f)

  def replace(what), do: curry(&Regex.replace(&1, &3, &2)).(what)
end

Now, without any change in our unit tests, we can verify that everything is working as before.

$ elixir curry_test.exs
..

Finished in 0.2 seconds (0.2s on load, 0.00s on tests)
2 tests, 0 failures

Randomized with seed 561000

Do we need to apply curry to everything?

No, it will always depend of your case, first, let’s see how worse can be if apply currying manually and then we will try to find another way to this whole process as a data transformation workflow.

# file: manual_currying.exs
defmodule ManualCurrying do
  def match(term) do
    fn what -> Regex.match?(term, what) end
  end

  def filter(f) do
    fn list -> Enum.filter(list, f) end
  end

  def replace(what) do
    fn replacement ->
      fn word ->
        Regex.replace(what, word, replacement)
      end
    end
  end
end

Our unit tests:

# file manual_currying_test.exs
ExUnit.start()

Code.require_file("manual_currying.exs", __DIR__)

defmodule MunualCurryingTest do
  use ExUnit.Case
  import ManualCurrying

  test "applying all the params at once or one step at a time should produce same results" do
    curried =
      fn a ->
        fn b ->
          fn c ->
            fn d ->
              a * b + div(c, d)
            end
          end
        end
      end

    five_squared = curried.(5).(5)

    assert five_squared.(10).(2) == curried.(5).(5).(10).(2)
  end

  test "curry allow to create composable functions" do
    has_spaces = match(~r/\s+/)
    sentences = filter(has_spaces)
    disallowed = replace(~r/[jruesbtni]/)
    censored = disallowed.("*")

    allowed = sentences.(["justin bibier", "and sentences", "are", "allowed"])

    assert "****** ******" == allowed |> hd() |> censored.()
  end
end

But, if you just one to execute this just one time, maybe we can do better thinking everything as a data transformation workflow, and actually this is the more succint way:

"****** ******" ==
  ["justin bibier", "and sentences", "are", "allowed"]
  |> Enum.filter(&Regex.match?(~r/\s+/, &1))
  |> hd()
  |> String.replace(~r/[jruesbtni]/, "*")

Wrapping up

Function currying is an interesting technique that allow us to reuse functions, for example, we can create a module with small functions that behave consistently without so much effort. Although, we need to keep in mind the arguments order when we want to apply function currying. Sometimes for functions like Enum.map/2, Enum.reduce/2, Enum.filter/2, etc. it would be better or easier to use curryRight than curry, normally our decision will depend on the arguments that will change constantly, because we want to put those at the end of the execution path.

As a final note, it could be a interesting exercise to implement uncurry, which is a function that converts a curried function to a function with arity n, that way we can convert these two types in either direction.

References

Asynchronous Tasks with Elixir

2016-09-03T03:00:00-05:00

One of my first contributions into ExDoc, the tool used to produce HTML documentation for Elixir projects, was to improve the documentation build process performance. My first approach for this was to build each module page concurrently, manually sending and receiving messages between processes. Then, as you can see in the Pull Request details, Eric Meadows-Jönsson pointed out that I should look at the Task module. In this article, I’ll try to show you the path that I followed to do that contribution. The original source code was something like this:

def run(modules, config) do
  # ...
  generate_list(modules, all, output, config, has_readme)
  generate_list(exceptions, all, output, config, has_readme)
  generate_list(protocols, all, output, config, has_readme)
  # ...
end

defp generate_list(nodes, all, output, config, has_readme) do
  Enum.each nodes, &generate_module_page(&1, all, output, config, has_readme)
end

defp generate_module_page(node, modules, output, config, has_readme) do
  content = Templates.module_page(node, config, modules, has_readme)
  File.write("#{output}/#{node.id}.html", content)
end

You can see that we can improve the build performance if we generate each module page concurrently. So, let’s do that in a moment!

For the purposes of this article, let me simplify the example above. So, please assume that the following was the original piece of code:

# source: demo.exs
defmodule AsyncTaskDemo do
  def run(nodes, output) do
    if File.exists? output do
      File.rm_rf! output
    end
    File.mkdir_p! output

    generate_list(nodes, output)
  end

  defp generate_list(nodes, output) do
    Enum.each nodes, &generate_module_page(&1, output)
  end

  defp generate_module_page(node, output) do
    name = String.capitalize(node)
    content = EEx.eval_string "Hello <%= name %>", [name: name]
    File.write("#{output}/#{node}.txt", content)
  end
end

As a second step, lets set up our test suite, in this case, we want to test a single file demo.exs.

# source: async_test.exs
ExUnit.start()

Code.require_file("demo.exs", __DIR__)

defmodule AsyncTaskDemoTest do
  use ExUnit.Case

  test "generate node pages" do
    nodes = ["john", "jane"]
    output = "doc"
    AsyncTaskDemo.run(nodes, output)

    files = File.ls! output

    assert files == ["jane.txt", "john.txt"]

    result = for f <- files do
       File.read! Path.join(output, f)
    end

    assert result == ["Hello Jane", "Hello John"]
  end
end

If we run our test suite we can see that everything is right:

$ elixir async_test.exs
.

Finished in 0.1 seconds (0.07s on load, 0.07s on tests)
1 test, 0 failures

Randomized with seed 114000

Ok, now it’s time to introduce the concept of asynchronous tasks with Kernel.spawn/1:

defp generate_list(nodes, output) do
  Enum.each nodes, &generate_module_page_async(&1, output)
end

defp generate_module_page_async(node, output) do
  spawn(fn ->
    generate_module_page(node, output)
  end)
end

defp generate_module_page(node, output) do
  # ...
end

At this point, you’ll notice that now generate_list/2 calls a new function that we named generate_module_page_async/2, this function will spawn new processes, each process will generate a module page.

One problem with the earlier approach is that our program is not waiting for the results of each invocation of the generate_module_page/2 function. Basically, we’re doing a fire and forget concurrent execution, this means that the caller process doesn’t receive any feedback from the spawned function. If we run our test we’ll see that is failing:

$ elixir async_test.exs

  1) test generate node pages (AsyncTaskDemoTest)
     async_test.exs:8
     Assertion with == failed
     code:  files == ["jane.txt", "john.txt"]
     left:  []
     right: ["jane.txt", "john.txt"]
     stacktrace:
       async_test.exs:15: (test)

Finished in 0.07 seconds (0.05s on load, 0.02s on tests)
1 test, 1 failure

Randomized with seed 47515

We can fix this error doing the following:

# source: demo.exs
  defp generate_list(nodes, output) do
    nodes
    |> Enum.map(&generate_module_page_async(&1, output))
    |> Enum.map(fn _ ->
      receive do
        :ok -> :ok
      end
    end)
  end

  defp generate_module_page_async(node, output) do
    caller = self()
    spawn(fn ->
      send(caller, generate_module_page(node, output))
    end)
  end

  defp generate_module_page(node, output) do
    # ...
  end

Let’s run our tests:

$ elixir async_test.exs
.

Finished in 0.09 seconds (0.06s on load, 0.03s on tests)
1 test, 0 failures

Randomized with seed 474778

Until now, we’re assuming that the File.write/3 always returns :ok. If for some reason File.write/3 returns an {:error, reason} message we’ll get stuck. One way to solve this issue is by doing the following:

# source: demo.exs
  defp generate_list(nodes, output) do
    nodes
    |> Enum.map(&generate_module_page_async(&1, output))
    |> Enum.map(fn _ ->
      receive do
        :ok -> :ok
        {:error, reason} -> IO.puts :stderr, "#{reason}"
      end
    end)
  end

Finally, if we don’t receive any message at all, we set a timeout after 5 seconds:

  defp generate_list(nodes, output) do
    nodes
    |> Enum.map(&generate_module_page_async(&1, output))
    |> Enum.map(fn _ ->
      receive do
        :ok -> :ok
        {:error, reason} -> IO.puts :stderr, "#{reason}"
      after 5000 ->
        IO.puts :stderr, "Timeout"
      end
    end)
  end

With all these changes, we’re ready to send our Pull Request, but wait, there is a better way to do this.

Elixir way: Task Module

As I mentioned before at the beginning of this article, Eric pointed out that I should look at the Task module documentation, and he was absolutely right, this module offers a really good abstraction and now it’s really easy to run simple processes.

Applying the Task.async/1 to our earlier example we cut down our source code to:

defp generate_list(nodes, output) do
  nodes
  |> Enum.map(&Task.async(fn ->
       generate_module_page(&1, output)
     end))
  |> Enum.map(&Task.await/1)
end

Task.async/1 creates a separate process that runs the generate_module_page/2 function, then, we collect each task descriptor (returned by Task.async/1), which is passed as the first value to Task.await/2, this call waits for our background process to finish and returns its value, in this case, the result of File.write/3.

You may ask yourself, how is it that with the concurrent version we can improve the overall performance?, well, that depends, first we need to take into account that our concurrent program will take advantage of a parallel computer (several processing units), if we run our program on a computer with only one CPU core, then, parallelism cannot happen.

Assume for a moment that the generate_module_page function always takes more than 2 seconds:

  defp generate_module_page(node, output) do
    :timer.sleep(2000)
    name = String.capitalize(node)
    content = EEx.eval_string "Hello <%= name %>", [name: name]
    File.write("#{output}/#{node}.txt", content)
  end

Then, with the following code we can test the performance improvements using a parallel computer:

# performance.exs
Code.require_file("demo.exs", __DIR__)

nodes = ["egg", "bacon", "spam", "sausage", "beans", "brandy", "foo", "baz"]
output = "doc"

before = System.monotonic_time()

AsyncTaskDemo.run(nodes, output)

later = System.monotonic_time()
diff = later - before
seconds = System.convert_time_unit(diff, :native, :seconds)

IO.puts "Diff: #{seconds} seconds. #{diff} :native time unit"

The results are the following:

# Sequential
$ elixir performance.exs
Diff: 16 seconds. 16122888704 :native time unit
# concurrent
$ elixir performance.exs
Diff: 2 seconds. 2052834417 :native time unit

The result of our concurrent version is eightfold faster than the sequential version :)

Wrapping up

Is always good to know how concurrency works in Erlang & Elixir, where you can create new lightweight processes with spawn, and then send/receive messages to/from those processes, you can also use some abstractions given by OTP (Open Telecom Platform), in general, that’s the way you can accomplish concurrency in Erlang, but sometimes, you want to run simple processes, something like background jobs, in those cases, is good to know about the Task module, which is a really good Elixir abstraction that keep us isolated from the details and let’s concentrate on our goals.

As José Valim later tweeted, this was another entry on the “hard things made easier with Elixir” series.

Another entry on the "hard things made easier with Elixir" series: https://t.co/luQ8gJaBpE :)
— José Valim (@josevalim) June 18, 2015

References

Acknowledgments

Thank you to José Valim, Sebastián Magrí and Ana Rangel for reviewing drafts of this post.

How to document your Javascript code

2014-08-27T12:00:00-05:00

Someone that knows something about Java probably knows about JavaDoc. If you know something about Python you probably document your code following the rules defined for Sphinx (Sphinx uses reStructuredText as its markup language). Or in C, you follow the rules defined for Doxygen (Doxygen also supports other programming languages such as Objective-C, Java, C#, PHP, etc.). But, what happens when we are coding in JavaScript? How can we document our source code?

As a developer that interacts with other members of a team, the need to document all your intentions must become a habit. If you follow some basic rules and stick to them you can gain benefits like the automatic generation of documentation in formats like HTML, PDF, and so on.

I must confess that I’m relatively new to JavaScript, but one of the first things that I implement is the source code documentation. I’ve been using JSDoc for documenting all my JavaScript code, it’s easy, and you only need to follow a short set of rules.

/**
 * @file Working with Tags
 * @author Milton Mazzarri <me@milmazz.uno>
 * @version 0.1
 */

var Tag = $(function(){
  /**
   * The Tag definition.
   *
   * @param {String} id - The ID of the Tag.
   * @param {String} description - Concise description of the tag.
   * @param {Number} min - Minimum value accepted for trends.
   * @param {Number} max - Maximum value accepted for trends.
   * @param {Object} plc - The ID of the {@link PLC} object where this tag belongs.
   */
  var Tag = function(id, description, min, max, plc) {
    id = id;
    description = description;
    trend_min = min;
    trend_max = max;
    plc = plc;
  };

  return {
    /**
     * Get the current value of the tag.
     *
     * @see [Example]{@link http://example.com}
     * @returns {Number} The current value of the tag.
     */
    getValue: function() {
      return Math.random;
    }
  };
 }());

In the previous example, I have documented the index of the file, showing the author and version, you can also include other things such as a copyright and license note. I have also documented the class definition including parameters and methods specifying the name, and type with a concise description.

After you process your source code with JSDoc the result looks like the following:

In the previous image you see the documentation in HTML format, also you see a table that displays the parameters with appropriate links to your source code, and finally, JSDoc implements a very nice style to your document.

If you need further details I recommend you check out the JSDoc documentation.

Grunt: The Javascript Task Manager

2014-06-28T16:00:00-05:00

When you play the Web Developer role, sometimes you may have to endure some repetitive tasks like minification, unit testing, compilation, linting, beautify or unpack Javascript code and so on. To solve this problems, and in the meantime, try to keep your mental health in a good shape, you desperately need to find a way to automate this tasks. Grunt offers you an easy way to accomplish this kind of automation.

In this article I’ll try to explain how to automate some tasks with Grunt, but I recommend that you should take some time to read Grunt’s documentation and enjoy the experience by yourself.

So, in the following sections I’ll try to show you how to accomplish this tasks:

Concatenate and create a minified version of your CSS, JavaScript and HTML files.
Automatic generation of the documentation for JavaScript with JSDoc.
Linting your JavaScript code.
Reformat and reindent (prettify) your JavaScript code.

You can install Grunt via npm (Node Package Manager), so, to install Grunt you need to install Node.js first.

Now that you have Node.js and npm installed is a good time to install globally the Grunt CLI (Command Line Interface) package.

$ sudo npm install -g grunt-cli

Once grunt-cli is installed you need to go to the root directory of your project and create a package.json file, to accomplish this you can do the following:

$ cd example_project
$ npm init

The previous command will ask you a series of questions in order to create the package.json file, package.json basically store metadata for projects published as npm modules. It’s important to remember to add this file to your source code versioning tool to ease the installation process of the development dependencies among your partners via npm install command.

At this point we can install Grunt and their respective plugins in the existing package.json with:

$ npm install grunt --save-dev

And the plugins that you need can be installed as follows:

$ npm install <grunt-plugin-name> --save-dev

Please note that the --save-dev parameter will change your devDependencies section in your package.json. So, be sure to commit the updated package.json file with your project whenever you consider appropriate.

Code documentation

If you document your code following the syntax rules defined on JSDoc 3, e.g.:

 /**
  * A callback function returning array defining where the ticks
  * are laid out on the axis.
  *
  * @function tickPositioner
  * @see {@link http://api.highcharts.com/highstock#yAxis.tickPositioner}
  * @param {String} xOrY - Is it X or Y Axis?
  * @param {Number} min - Minimum value
  * @param {Number} max - Maximum value
  * @returns {Array} - Where the ticks are laid out on the axis.
  */
 function tickPositioner(xOrY, min, max) {

    // do something

    return tickPositions;
 }

If you need more information about JSDoc, read their documentation, it’s easy to catch up.

The next step to automate the generation of the code documentation is to install first the grunt-jsdoc plugin as follows:

$ npm install grunt-jsdoc --save-dev

Once grunt-jsdoc is installed you must create your Gruntfile.js in the root directory of your project and then add the jsdoc entry to the options of the initConfig method.

module.exports = function(grunt) {

  // Project configuration.
  grunt.initConfig({
      pkg: grunt.file.readJSON('package.json'),
      jsdoc : {
          dist : {
              src: ['src/*.js', 'test/*.js'],
              dest: 'doc'
          }
      }
  });

};

Then, you need to load the plugin after the initConfig method in the Gruntfile.js:

// Load the plugin that provides the 'jsdoc' task.
grunt.loadNpmtasks('grunt-jsdoc');

The resulting Gruntfile.js until now is:

module.exports = function(grunt) {

  // Project configuration.
  grunt.initConfig({
      pkg: grunt.file.readJSON('package.json'),
      jsdoc : {
          dist : {
              src: ['src/*.js', 'test/*.js'],
              dest: 'doc'
          }
      }
  });

  // Load the plugin that provides the 'jsdoc' task.
  grunt.loadNpmtasks('grunt-jsdoc');

};

To generate the documentation, you need to call the jsdoc task as follows:

$ grunt jsdoc

Immediately you can see, inside the doc directory, the available documentation in HTML format with some beautiful styles by default.

Linting your JavaScript code

In order to find suspicious, non-portable or potential problems in JavaScript code or simply to enforce your team’s coding convention, whatever may be the reason, I recommend that you should include a static code analysis tool in your toolset.

The first step is to define your set of rules. I prefer to specify the set of rules in an independent file called .jshintrc located at the root directory of the project, let’s see an example:

// file: .jshintrc
{
    "globals": {
        "Highcharts": true, // This is only necessary if you use Highcharts
        "module": true // Gruntfile.js
    },
    "bitwise": true,
    "browser": true,
    "camelcase": true,
    "curly": true,
    "eqeqeq": true,
    "forin": true,
    "freeze": true,
    "immed": true,
    "indent": 4,
    "jquery": true,
    "latedef": true,
    "newcap": true,
    "noempty": true,
    "nonew": true,
    "quotmark": true,
    "trailing": true,
    "undef": true,
    "unused": true
}

If you need more details about the checks that offer every rule mentioned above, please, read the page that contains a list of all options supported by JSHint

To install the grunt-contrib-jshint plugin, , please do as follows:

$ npm install grunt-contrib-jshint --save-dev

Next, proceed to add the jshint entry to the options of the initConfig method in the Gruntfile.js as follows:

jshint: {
 options: {
  jshintrc: '.jshintrc'
 },
 all: ['Gruntfile.js', 'src/*.js', 'test/*.js']
}

Then, as it’s done in the previous section, you need to load the plugin after the initConfig method in the Grunfile.js

// Load the plugin that provides the 'jshint' task.
grunt.loadNpmTasks('grunt-contrib-jshint');

To validate your JavaScript code against the previous set of rules you need to call the jshint task:

$ grunt jshint

Note that if you need more information or explanations about the errors that you may receive after the previous command I suggest that you should visit JSLint Error Explanations site.

Code Style

As I mentioned in the previous section, I suggest that you should maintain an independent file where you define the set of rules about your coding styles:

// file: .jsbeautifyrc
{
  "indent_size": 4,
  "indent_char": " ",
  "indent_level": 0,
  "indent_with_tabs": false,
  "preserve_newlines": true,
  "max_preserve_newlines": 2,
  "jslint_happy": true,
  "brace_style": "collapse",
  "keep_array_indentation": false,
  "keep_function_indentation": false,
  "space_before_conditional": true,
  "break_chained_methods": false,
  "eval_code": false,
  "unescape_strings": false,
  "wrap_line_length": 0
}

Next, proceed to add the jsbeautifier entry to the options of the initConfig method in the Gruntfile.js as follows:

jsbeautifier: {
  modify: {
      src: 'index.js',
      options: {
          config: '.jsbeautifyrc'
   }
  },
  verify: {
   src: ['index.js'],
   options: {
   mode: 'VERIFY_ONLY',
   config: '.jsbeautifyrc'
  }
 }
}

The next step it to load the plugin after the initConfig method:

// Load the plugin that provides the 'jsbeautifier' task.
grunt.loadNpmTasks('grunt-jsbeautifier');

To adjust your JS files according to the previous set of rules you need to call the jsbeautifier task:

$ grunt jsbeautifier:modify

Concat and minified CSS, HTML, JS

To reduce the size of your CSS, HTML and JS files do as follows:

$ npm install grunt-contrib-uglify --save-dev
$ npm install grunt-contrib-htmlmin --save-dev
$ npm install grunt-contrib-cssmin --save-dev

Next, add the htmlmin, cssmin and uglify entries to the options of the initConfig method in the Gruntfile.js as follows:

htmlmin: {
  dist: {
    options: {
      removeComments: true,
      collapseWhitespace: true
    },
    files: {
      'dist/index.html': 'src/index.html',     // 'destination': 'source'
      'dist/contact.html': 'src/contact.html'
    }
  }
},
cssmin: {
  add_banner: {
    options: {
      banner: '/* My minified css file */'
    },
    files: {
      'path/to/output.css': ['path/to/**/*.css']
    }
  }
},
uglify: {
 options: {
   banner: '/* <%= grunt.template.today("yyyy-mm-dd") %> */\n',
   separator: ',',
   compress: true,
 },
 chart: {
   src: ['src/js/*.js'],
   dest: 'dist/js/example.min.js'
 }
}

The next step it to load the plugins after the initConfig method:

// Load the plugin that provides the 'uglify' task.
grunt.loadNpmTasks('grunt-contrib-uglify');
// Load the plugin that provides the 'htmlmin' task.
grunt.loadNpmTasks('grunt-contrib-htmlmin');
// Load the plugin that provides the 'cssmin' task.
grunt.loadNpmTasks('grunt-contrib-cssmin');

To adjust your files according to the previous set of rules you need to call the htmlmin, cssmin or uglify tasks:

$ grunt htmlmin
$ grunt cssmin
$ grunt uglify

After loading all of your Grunt tasks you can create some aliases.

If you want to create an alias that runs the three previous tasks in one step do as follows:

// Wrapper around htmlmin, cssmin and uglify tasks.
grunt.registerTask('minified', [
        'htmlmin',
        'cssmin',
        'uglify'
    ]);

So, to execute the three tasks in one step you can do the following:

$ grunt minified

The previous command is a wrapper around the htmlmin, cssmin and uglify tasks defined before.

Wrapping all together we get the following [Grunfile.js][]

Other plugins

Grunt offers you a very large amount of plugins, they have a dedicated section to list them!

I recommend that you should take some time to check out the following plugins:

grunt-newer lets you configure Grunt tasks to run with newer files only.
grunt-contrib-watch lets you run predefined tasks whenever file patterns are added, changed or deleted.
grunt-contrib-imagemin let you minify images.

Conclusion

Certainly, Grunt do a great job when we talk about task automation, also, automating this repetitive tasks is fun and reduce the errors associated with tasks that in the past we used to run manually over and over again. That been said, I definitely recommend that you should try Grunt, you will get in exchange an improvement in your development workflow, also with it you can forget to endure repetitive tasks that we hate to do, as a result you will get more free time to focus on getting things done.

Last but not least, another option for Grunt is Gulp, I recently read about it, if you look at their documentation you notice that their syntax is more clean than Grunt, also is very concise and allows chaining calls, following the concept of streams that pipes offer in *nix like systems, so, I’ll give it a try in my next project.

The DRY principle

2014-06-24T19:00:00-05:00

The DRY (Don’t Repeat Yourself) principle it basically consist in the following:

Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

That said, it’s almost clear that the DRY principle is against the code duplication, something that in the long-term affect the maintenance phase, it doesn’t facilitate the improvement or code refactoring and, in some cases, it can generate some contradictions, among other problems. Recently I have inherited a project, and one of the things that I noticed in some part of the source code are the following:

After a glance, you can save some bytes and apply the facade and module pattern without breaking the API compatibility in this way:

But, if you have read the jQuery documentation it’s obvious that the previous code portions are against the DRY principle, basically, this functions expose some shorthands for the $.ajax method from the jQuery library. That said, it’s important to clarify that jQuery, from version 1.0, offers some shorthands, they are: $.get(), $.getJSON(), $.post(), among others.

So, in this particular case, I prefer to break the backward compatibility and delete the previous code portions. After that, I changed the code that used the previous functions and from this point I only used the shorthand that jQuery offers, some examples may clarify this thought:

Another advantage of using the shorthand methods that jQuery provides is that you can work with Deferreds, one last thing that we must take in consideration is that jqXHR.success() and jqXHR.error() callback methods are deprecated as of jQuery 1.8.

Anyway, I wanted to share my experience in this case. Also, remark that we need to take care of some principles at the moment we develop software and avoid to reinvent the wheel or do overengineering.

Last but not least, one way to read offline documentation that I tend to use is Dash or Zeal, I can reach a bunch of documentation without the need of an Internet connection, give it a try!

libturpial needs your help

2014-03-10T16:00:00-05:00

Do you want to begin to contribute into libturpial codebase but you don’t know where to start?, that’s ok, it also happens to me sometimes, but today, the reality is that we need your help to fix some errors reported by our style checker (flake8), this errors are:

E126: continuation line over-indented for hanging indent
E128: continuation line under-indented for visual indent
E231: missing whitespace after :
E251: unexpected spaces around keyword / parameter equals
E261: at least two spaces before inline comment
E301: expected 1 blank line
E302: expected 2 blank lines
E303: too many blank lines
E501: line too long
E711: comparison to None should be if cond is not None:
E712: comparison to False should be if cond is False: or if not cond:
F401: imported but unused
F403: unable to detect undefined names
F821: undefined name
F841: local variable is assigned to but never used
F999: syntax error in doctest
W291: trailing whitespace
W391: blank line at end of file
W601: .has_key() is deprecated, use in

As you can see, some errors are really easy to fix but are too many for us, so please, we desperately need your help, help us!

The following is how we expect the community can contribute code into libturpial via Pull Requests.

1) If you don’t have a Github account, please create one first.

2) Fork our project

3) Install Git, after that you should setup your name and email:

    $ git config --global user.name "Your Name, not your Github nickname"
    $ git config --global user.email "you@example.com"

4) Set your local repository

    $ git clone https://github.com/your_github_nickname/libturpial.git

5) Set satanas/libturpial as your upstream remote, this basically tells Git that your are referencing libturpial’s repository as your main source.

    $ git remote add upstream https://github.com/satanas/libturpial.git

5.1) Update your local copy

    $ git fetch upstream

6) Verify that no one else has been working on the same bug, you can check that in our issues list, in this list you can also check Pull Requests pending for the BDFL approval.

7) Working on a bug

7.1) In the first place, install tox

    $ pip install tox

7.2) Then, create a branch that identifies the bug that you will begin to work on:

    $ git checkout -b E231 upstream/development

In this example we are working on the bugs of the type: E231 (as indicated by our style checker, flake8)

7.3) Make some local changes and commit, repeat.

7.3.1) Delete the error code that you will begin to work from the ignore list located at the flake8 section in the tox.ini file (located at the root of the project)

Example:

    # Original list:
    ignore = E126,E128,E231,E251,E261,E301

    # After we decide to work in the E231 error:
    ignore = E126,E128,E251,E261,E301

7.3.2) Execute tox -e py27 to check the current errors.

7.3.3) Fix, fix, fix…

7.3.4) Commit your changes

    $ git commit -m "Fixed errors 'E231' according to flake8."

7.4) In the case that you fixed all errors of the same type, please delete the corresponding line in the tox.ini file (located at the root of the project)

7.4.1) Don’t forget to commit that.

8) Publish your work

    $ git push origin E231

Please, adjust the name E231 to something more appropiate in your case.

9) Create a Pull Request and don’t hesitate to bug the main maintainer until your changes get merged and published. Last but not least, don’t forget to add yourself as an author in the AUTHORS file, located at the root of the project.

10) Enjoy your work! :-)

If you want to help us more than you did already you can check our issues list, also, you can check out the Turpial project, our light, fast and beautiful microblogging client written in Python, currently supports Twitter, and identi.ca.

You can find more details on our guide, also, in case of doubt don’t hesitate to reach us.

About libturpial

libturpial is a Python library that handles multiple microblogging protocols. It implements a lot of features and aims to support all the features for each protocol. At the moment it supports Twitter and Identi.ca and is the backend used for Turpial.

About Turpial

Turpial is an alternative client for microblogging with multiple interfaces. At the moment it supports Twitter and Identi.ca and works with Gtk and Qt interfaces.

milmazz

Oban: Testing your Workers and Configuration

Testing the implementation of the Oban.Worker behaviour

Testing your Oban Configuration

Testing your plugins configuration

Testing workers included in Oban Pro

Conclusion

Acknowledgments

Oban: job processing library for Elixir

Oban Overview

Oban Web

Dashboard

Job details

Queues

Smart Engine extension

Conventions

Naming and file/directory organization

Keep calls to Oban.insert or Oban.insert_all contained in your worker

One-off jobs

Challenges

Inserting Oban jobs in bulk

Complex workers

Limitations

Wishlist

Community

Conclusion

Acknowledgments

Improve the codebase of an acquired product

Background

Diagnostics

Strategies

Tooling support

Design

Rewards

Wrapping Up

Elixir’s MIME library review

API

extensions(String.t()) :: [String.t()]

type(String.t()) :: String.t()

from_path(Path.t()) :: String.t()

has_type?(String.t()) :: boolean

valid?(String.t()) :: boolean

Who is using MIME library?

How was the MIME library built?

Automatic recompilation

Summary

Follow-up: Function currying in Elixir

Do we need to apply curry to everything?

Wrapping up

References

Asynchronous Tasks with Elixir

Elixir way: Task Module

Wrapping up

References

Acknowledgments

How to document your Javascript code

Grunt: The Javascript Task Manager

Code documentation

Linting your JavaScript code

Code Style

Concat and minified CSS, HTML, JS

Other plugins

Conclusion

The DRY principle

libturpial needs your help

About libturpial

About Turpial

Keep calls to `Oban.insert` or `Oban.insert_all` contained in your worker

`extensions(String.t()) :: [String.t()]`

`type(String.t()) :: String.t()`

`from_path(Path.t()) :: String.t()`

`has_type?(String.t()) :: boolean`

`valid?(String.t()) :: boolean`