Тех-Детали: 2017

понедельник, 14 августа 2017 г.

Passing ByteString contents reliably into C code

The question was raised a week ago on Reddit. Recently I implemented the elaborated solution in nginx-haskell-module. In this article I will talk about why I wanted to pass ByteStrings contents directly, what obstacles lay in this way, and what was the best way to implement this. The answer to the question why is quite simple and straightforward. Many Haskell handlers from ngx-export, including all asynchronous and content handlers pass data from lazy ByteStrings to the C side to be further used in Nginx variable or content handlers. Before version 1.3 of the nginx-haskell-module the data was merely copied into a single buffer (for variable handlers) or multiple buffers (for content handlers). Below are both the historical functions.

toSingleBuffer :: L.ByteString -> IO (Maybe CStringLen)
toSingleBuffer EmptyLBS =
    return $ Just (nullPtr, 0)
toSingleBuffer s = do
    let I l = L.length s
    t <- safeMallocBytes l
    if t /= nullPtr
        then do
            void $ L.foldlChunks
                (\a s -> do
                    off <- a
                    let l = B.length s
                    B.unsafeUseAsCString s $ flip (copyBytes $ plusPtr t off) l
                    return $ off + l
                ) (return 0) s
            return $ Just (t, l)
        else return Nothing

toBuffers :: L.ByteString -> IO (Maybe (Ptr NgxStrType, Int))
toBuffers EmptyLBS =
    return $ Just (nullPtr, 0)
toBuffers s = do
    t <- safeMallocBytes $
        L.foldlChunks (const . succ) 0 s * sizeOf (undefined :: NgxStrType)
    l <- L.foldlChunks
        (\a s -> do
            off <- a
            maybe (return Nothing)
                (\off -> do
                    let l = B.length s
                    -- l cannot be zero at this point because intermediate
                    -- chunks of a lazy ByteString cannot be empty which is
                    -- the consequence of Monoid laws applied when it grows
                    dst <- safeMallocBytes l
                    if dst /= nullPtr
                        then do
                            B.unsafeUseAsCString s $ flip (copyBytes dst) l
                            pokeElemOff t off $ NgxStrType (fromIntegral l) dst
                            return $ Just $ off + 1
                        else do
                            mapM_
                                (peekElemOff t >=> \(NgxStrType _ x) -> free x)
                                [0 .. off - 1]  -- [0 .. -1] makes [], so wise!
                            free t
                            return Nothing
                ) off
        ) (return $ if t /= nullPtr then Just 0 else Nothing) s
    return $ l >>= Just . (,) t

where

safeMallocBytes :: Int -> IO (Ptr a)
safeMallocBytes =
    flip catchIOError (const $ return nullPtr) . mallocBytes

pattern I i                 <- (fromIntegral -> i)
pattern PtrLen s l          <- (s, I l)
pattern PtrLenFromMaybe s l <- (fromMaybe (nullPtr, -1) -> PtrLen s l)
pattern EmptyLBS            <- (L.null -> True)

What imports are needed and what NgxStrType is, you can find in the original module. These functions firstly allocate a storage for a single buffer containing all the buffers of the original ByteString (toSingleBuffer), or a storage of storages corresponding to every single buffer of the ByteString (toBuffers). In case of allocation error the functions return Nothing. The caller (which is still a Haskell function that represents a specific Haskell handler on the C side) interprets this as an allocation error and returns a tuple (nullPtr, -1) of kind (bufs, n_bufs) via the pattern PtrLenFromMaybe. In both the functions, data from the original ByteString gets copied with copyBytes. User of the functions must free the data at some point. Imagine the following simple version of toBuffers that directly passes the ByteString buffers to the C side.

toBuffers :: L.ByteString -> IO (Ptr NgxStrType, Int)
toBuffers EmptyLBS =
    return (nullPtr, 0)
toBuffers s = do
    t <- safeMallocBytes $
        L.foldlChunks (const . succ) 0 s * sizeOf (undefined :: NgxStrType)
    if t == nullPtr
        then return (nullPtr, -1)
        else (,) t <$>
                L.foldlChunks
                    (\a s -> do
                        off <- a
                        B.unsafeUseAsCStringLen s $
                            \(s, l) -> pokeElemOff t off $
                                NgxStrType (fromIntegral l) s
                        return $ off + 1
                    ) (return 0) s

Now we return the tuple directly and put in its first element (the storage of storages that now transforms to a storage of references) the references to the original ByteString buffers. Later, the data gets passed to the C side. Very simple, no extra allocations, no copying, no obligations of freeing data on the C side. Is this really possible? How does life-time of the internal ByteString data correspond to desired life-time of the data on the C side? The answer is simple: there is no correspondence between them! After returning from a Haskell handler on the C side, the passed (or better to say poked) contents of the ByteString can be easily freed by the Haskell garbage collector, because nothing refers to the ByteString on the Haskell side anymore. This is a bad news! Nginx uses epoll() (or a similar mechanism for feeding tasks when epoll() is not available), and this means that we may need the references to stay valid during unpredictable period of time. How could we ensure validity of references? The StablePtr to the rescue!

A stable pointer is a reference to a Haskell expression that is guaranteed not
to be affected by garbage collection, i.e., it will neither be deallocated nor
will the value of the stable pointer itself change during garbage collection
(ordinary references may be relocated during garbage collection). Consequently,
stable pointers can be passed to foreign code, which can treat it as an opaque
reference to a Haskell value.

So, a stable pointer guarantees that an objects it points to will be alive until the pointer gets freed (read the docs further for details). We could pass a StablePtr to the ByteString along with the references to the ByteString buffers to the C code, and free the pointer via calling a special imported Haskell handler like

foreign export ccall ngxExportReleaseLockedByteString ::
    StablePtr L.ByteString -> IO ()

ngxExportReleaseLockedByteString :: StablePtr L.ByteString -> IO ()
ngxExportReleaseLockedByteString = freeStablePtr

when the references are no longer needed (e.g. at the end of the HTTP request). But in the documentation on the StablePtr nothing is said about possible relocations of the object it points to. Normally, the garbage collector moves alive objects to a new heap when the old heap gets removed (see a brilliant article here). Oh, no! Is the StablePtr not a cure? Relax… We do not need the ByteString itself actually. Remember? We need its buffers only! Digging into ByteString implementation reveals that the buffers are allocated in special pinned byte arrays using function newPinnedByteArray#. The docs say about it (more precisely, about newPinnedByteArray, but it merely calls the former):

Create a pinned byte array of the specified size. The garbage collector is
guaranteed not to move it.

Thus, the Haskell garbage collector will leave the internal ByteString buffers in their original places. On the other hand, the StablePtr guarantees aliveness of the ByteString until it’s freed (hence, aliveness of its not-relocatable buffers too). This is all that we really need, and this seems to be the best solution. Update. Function ngxExportReleaseLockedByteString is not really necessary because freeStablePtr is actually an imported C function hs_free_stable_ptr() declared in HsFFI.h which can be called from C code directly.

среда, 17 мая 2017 г.

Асинхронная обработка тела клиентского запроса в nginx-haskell-module

Только что добавил данную функциональность в модуль. Вот ссылка на соответствующую главу в документации. В modular подходе нужны экспортеры ngxExportAsyncOnReqBody, которые доступны в новой версии 0.3.0.0 пакета ngx-export. Протестировать хэндлеры reqBody, reqHead и reqFld из примера в test/tsung/nginx-async.cfg можно с помощью curl. Например,

curl --data-binary @path/to/some/file 'http://localhost:8010/rb?a=10'

выведет файл, посланный в POST запросе, его первые 10 строк, а также, скорее всего неуспешный, результат поиска значения поля 10 из содержимого посланного файла, которое хэндлер reqFld попытается проинтерпретировать как форму HTML. Для больших файлов возможны два варианта отказа: в случае очень большого файла nginx скажет, что не смог принять его содержимое ввиду большого размера — в этом случае можно увеличить значение максимального размера тела запроса, равное по умолчанию 1 Мб, стандартной директивой client_max_body_size. Файл поменьше может быть сохранен во временный файл, и модуль сообщит об этом в лог на уровне alert — в этом случае можно подкрутить значение стандартной директивы client_body_buffer_size, равное по умолчанию 8 или 16 Кб. А теперь давайте пошлем форму.

curl --data 'a=23&b=14&c=6' 'http://localhost:8010/rb?a=b'
>>> BODY

a=23&b=14&c=6
>>> BODY HEAD b


>>> FIELD b

14

Тело выведено верно. Значение b аргумента запроса a не соответствует числу, поэтому выводится 0 первых строк тела запроса — здесь все хорошо. Ну и значение поля b из тела-формы равно 14 — и здесь все верно. И напоследок ссылка на презентацию модуля, которую я представлял на Spb Fprog Meetup 27 апреля этого года. В ней, естественно, еще нет упоминаний о хэндлерах тела запроса.

среда, 4 января 2017 г.

nginx-haskell-module: labeled media routing example

In this article I will show how to setup HTTP routing with nginx and Nginx Haskell module to mutable arrays of geographically distributed labeled media storages, front-ended by REST API media managers. Let's assume that the managers accept only 3 types of requests: /conf, /read and /write. On the /conf request a manager must return a JSON object with a list of all media storages it manages with their parameters: labels, hints, mode (RW or RO) and timestamp referring to the last update of the data. On the /read and the /write requests a manager must try to read from or write to a media storage according to values of the request parameters label or hint respectively. On successful /write request the label of the affected media storage must be returned: thus the client will be able to provide the label value in subsequent /read requests. If a /read or a /write request was unsuccessful then HTTP 404 status must be returned. I won't discuss implementation of the media managers: the spoken requirements and clear understanding that they must serve as transparent transformers of the REST requests proxied via media routers to the media storage layer is enough for the purpose of the article. The media routers is another story. They must serve user's /read and /write requests properly proxying them to those media managers which know about supplied label or hint. To collect information about available labels and hints and keep it valid, the routers must regularly poll their own media managers and other routers, or they can sign into a message queue to obtain updates asynchronously. For the sake of simplicity I will use the polling model: in this case on any unexpected response a client may assume that the media storage layout was altered but the router had not received the update yet, and simply retry the request in a minute or what the router polling interval is. Active polling requires asynchronous services from the nginx-haskell-module. Let's poll other media routers with /conf request as for media managers, expecting that they return a JSON object with a list of configurations collected from their own media managers. The model is depicted in the following image.

There are 3 geographically (or by other criteria) separated areas in the LAN which are colored differently. On front of every area a single Labeled Media Router stands (LMR 1, LMR 2 and LMR 3): this manifests the closest to the client's area (i.e. WAN) layer called Routers (to be precise, all the routers are depicted as double towers: it means that they may stand after a load balancer). The second layer, Managers, consists of media managers bound to media routers (LMM 1 through LMM 7): these bounds are statically defined in the nginx configuration. Some media managers are doubled as media routers: the situation when many routers are bound to many managers via a load balancer affects the collected data update algorithm in a media router. The lowest layer, Media, consists of media storages that may change dynamically (including addition and removal of new storages and altering their parameters). The media storages must have labels (dir_1, dir_2 etc.), a mode (RW or RO) and, optionally, a list of hints. On the image, some storages have a single hint fast. If the list of hints is empty then a storage is supposed to have hint default. Considering that storages with the same label are replicated somehow implies that reading from a storage in a different area is safe: normally data on such storages is expected to be equal, however when data is not found, the media manager must return status 404 and the router must skip to another area. Presence of replication is depicted by blue dashed arrows for storages with label dir_1, other storages with equal labels are also replicated but I did not show this to not overload the image. Mutual polling between all media routers with /conf requests is shown with bold red arrows. An example of user's /read request is shown on the image. The user requested reading a file from label dir_1. Router LMR 1 found that the request could be proxied to media manager LMM 1, but it responded with status 404 and the router passed the request to media manager LMM 4 from another area, having skipped another media manager LMM 3 from its own area. An interesting scenario with two proxy_pass actions in a single request! I'll show how to achieve this later, but now I want to specify requirements to /read and /write requests and the media router behavior more accurately.

/read
- parameter label must be provided, it is supposed to have been given in the /write response from the media manager where data was actually written,
- media storages with RW or RO mode can be chosen,
- if a media manager returns status 404 then the router must skip other media managers in the current area that hold this label behind,
- if a media manager is not accessible or returns 502, 503 and 504 then the router must try to pass the request to another media manager in the current area that hold the label behind or to a manager from another area if there is no suitable managers in the current area,
- if there are more than one managers that hold the label behind then they must be chosen in round-robin manner in every new request to the given nginx worker process,
- if a media manager is not accessible or returns 502, 503 and 504 then it must be blacklisted for a time period specified in the nginx configuration,
- if the list of suitable media managers get exhausted while no successful response has been received then the media router must return status 503.
/write
- parameter hint can be provided, if it was not then it is supposed to be default,
- only media storages with RW mode can be chosen,
- other rules from the /read clause that regard to statuses 404, 502, 503 and 504, inaccessibility, round-robin cycling and media managers exhaustion are applied here too.

Now let's turn to nginx configuration file. Below is a configuration for a media router.

user                    nginx;
worker_processes        2;

events {
    worker_connections  1024;
}

http {
    default_type        application/octet-stream;
    sendfile            on;

    include nginx-lmr-http-rules.conf;

    haskell_run_service queryEndpoints $hs_all_backends
        'Conf { updateInterval = Sec 20
              , blacklistInterval = Min 1
              , backends = ("/conf",
                                ["127.0.0.1:8011"
                                ,"127.0.0.1:8012"
                                ,"127.0.0.1:8013"
                                ]
                           )
              , partners = ("/conf",
                                ["127.0.0.1:8020"
                                ,"127.0.0.1:8030"
                                ]
                           )
              }';

    server {
        listen       8010;
        server_name  main;
        error_log    /var/log/nginx/lmr-error.log;
        access_log   /var/log/nginx/lmr-access.log;

        include nginx-lmr-server-rules.conf;
    }
}

Look at the directive haskell_run_service. It passes a typed data Conf to the haskell module being loaded from the shared library lmr.so in file nginx-lmr-http-rules.conf which content is shown below.

haskell load /var/lib/nginx/lmr.so;
#haskell rts_options -l;

haskell_var_nocacheable $hs_msg
                        $hs_seqn
                        $hs_key
                        $hs_start
                        $hs_idx
                        $hs_backend
                        $hs_status;

haskell_var_compensate_uri_changes $hs_msg;

I put haskell code in a separate source file lmr.hs because I could not have it put in the configuration even if I wanted: length of an nginx variable's value cannot exceed 4096 bytes when it is read from a configuration file, but content of the lmr.hs does exceed. Type Conf is defined in lmr.hs as

data Conf = Conf { updateInterval    :: TimeInterval
                 , blacklistInterval :: TimeInterval
                 , backends          :: (Url, [Destination])
                 , partners          :: (Url, [Destination])
                 } deriving (Read)

type Destination = String   -- IP address or domain name
type Url = String           -- normally must start with /

data TimeInterval = Hr Int
                  | Min Int
                  | Sec Int
                  | HrMin (Int, Int)
                  | MinSec (Int, Int)
                  deriving (Read)

Function queryEndpoints is a service that will run every updateInterval, i.e. 20 sec: this is the negotiation process between media routers (the partners) depicted in the image with bold red arrows, blacklistInterval is a global reference that defines for how long time inaccessible media routers must be blacklisted, backends are own media servers (i.e. LMM 1, LMM 2 and LMM 3 as for LMR 1 in terms of the image) with the URL to their /conf location, and partners are other media routers (i.e. LMR 2 and LMR 3 as for LMR 1) with the URL to their /conf location. Below is definition of queryEndpoints.

queryEndpoints (C8.unpack -> conf) firstRun = do
    let Conf (toSec -> upd) (toSec -> bl) (url, own) (purl, remote) =
            readDef (Conf (Hr 24) (Min 1) ("", []) ("", [])) conf
    if firstRun
        then atomicWriteIORef blInterval bl
        else threadDelaySec upd
    (M.fromList -> obd) <-
        mapConcurrently
            (\d -> catchBadResponseOwn d $
                second (fromMaybe (BackendData 0 M.empty) . decode) <$>
                    query url d
            ) own
    (M.fromList . ((Own, obd) :) . map (first Remote) -> abd) <-
        mapConcurrently
            (\d -> catchBadResponseRemote d $
                second (fromMaybe M.empty . decode) <$>
                    query purl d
            ) remote
    oldbd <- readIORef allBackends
    let allbd =
            M.mapWithKey
                (\p -> M.mapWithKey
                    (\d a -> case M.lookup p oldbd >>= M.lookup d of
                        Just v@(BackendData ts _) ->
                            if ts > timestamp a then v else a
                        _ -> a
                    )
                ) abd
        newRoutes = toRoutes allbd
    atomicWriteIORef allBackends allbd
    oldRoutes <- fromRRRoutes . snd . snd <$> readIORef routes
    when (newRoutes /= oldRoutes) $ do
        rr <- mkRRRoutes newRoutes
        atomicModifyIORef routes $ \((a, _), o) -> ((o, (a + 2, rr)), ())
    return $ encode newRoutes
    where query u = runKleisli $ arr id &&& Kleisli (getUrl . flip mkAddr u)
          mkAddr = (("http://" ++) .) . (++)
          catchBadResponse f d = handle $
              \(_ :: SomeException) -> ((d, ) . f d) <$> readIORef allBackends
          catchBadResponseOwn = catchBadResponse $
              \d -> fromMaybe (BackendData 0 M.empty) . M.lookup d .
                  fromMaybe M.empty . M.lookup Own
          catchBadResponseRemote = catchBadResponse $
              \d -> fromMaybe M.empty . M.lookup (Remote d)
          threadDelaySec = threadDelay . (* 1e6)
          toSec (Hr h)          = 3600 * h
          toSec (Min m)         = 60 * m
          toSec (Sec s)         = s
          toSec (HrMin (h, m))  = 3600 * h + 60 * m
          toSec (MinSec (m, s)) = 60 * m + s
ngxExportServiceIOYY 'queryEndpoints

getResponse url = fmap responseBody . (parseRequest url >>=)

httpManager = unsafePerformIO $ newManager defaultManagerSettings
{-# NOINLINE httpManager #-}

getUrl url = getResponse url $ flip httpLbs httpManager

The function returns an encoded JSON object (written into variable $hs_all_backends on the nginx side) wrapped inside IO Monad: this means that queryEndpoints performs several actions with side effects sequentially. The first action is reading data Conf passed from the nginx configuration. Then it updates a global reference blInterval if the service runs for the first time, or waits 20 sec otherwise. The next action is querying own media managers (backends) and then querying other media routers (partners), and collecting results in allbd with respect to timestamps received from the backends which are compared with data collected on the previous iteration (oldbd which is read from the global reference allBackends). Then allbd gets written in the allBackends. So far queryEndpoints updates data with type CollectedData that closely represents the original layout of partners and backends.

type Label = String
type Hint = String

type Timestamp = Integer

data BackendData = BackendData { timestamp :: Timestamp
                               , labels    :: Map Label LabelData
                               } deriving (Generic, Show)
instance FromJSON BackendData
instance ToJSON BackendData

data Possession = Own
                | Remote Destination
                deriving (Generic, Read, Show, Eq, Ord)
instance ToJSON Possession
instance ToJSONKey Possession

data Mode = RW
          | RO
          deriving (Generic, Read, Show, Eq, Ord)
instance FromJSON Mode
instance ToJSON Mode

type PartnerData = Map Destination BackendData
type CollectedData = Map Possession PartnerData

data LabelData = LabelData { mode :: Mode
                           , hint :: [Hint]
                           } deriving (Generic, Show)
instance ToJSON LabelData
instance FromJSON LabelData

Next actions in the function queryEndpoints are doing transformation from allbd to newRoutes of type (Routes, Routes) adapted for being used in the search algorithm both for labels (/read) and hints (/write). The first element of the tuple contains elements laid out for /read requests, the second element is for /write requests.

type Backends = Map Possession [Destination]
type Routes = Map Hint (Map Label Backends)

both :: Arrow a => a b c -> a (b, b) (c, c)
both = join (***)

toRoutes :: CollectedData -> (Routes, Routes)
toRoutes = both (mkRoutes . sort) .
    (
        map (first show) . filter ((AnyHint ==) . fst) . map snd &&&
        map (show *** first (const "any")) . filter ((AnyHint /=) . fst) .
            fromMaybe [] . M.lookup RW . combineByFst .
                sortBy (compare `on` fst)
    ) .
    concatMap (\(m, hs, l, p, d) -> [(m, (h, (l, (p, d)))) | h <- hs]) .
        concatMap (\(p, v) ->
            concatMap (\(d, v) ->
                map (\(l, v) -> (mode v, ckHint $ hint v, l, p, d)) $
                    M.assocs $ labels v) $ M.assocs v) . M.assocs
    where ckHint h = AnyHint :
              if null h then [PlainHint "default"] else map PlainHint h
          combineByFst :: Ord a => [(a, b)] -> Map a [b]
          combineByFst =
              M.fromDistinctAscList .
                  map (uncurry (foldr $ \b (c, a) -> (c, snd b : a)) .
                        ((, []) . fst . head &&& id)) . groupBy ((==) `on` fst)
          mkRoutes =
              M.map (M.map $ M.map (map head . group . sort) . combineByFst) .
                  M.map combineByFst . combineByFst

Thus, the type Routes is a Map of Map of Map. The innermost Map's key is Possession that is just a distance to backends (Own or Remote addr), the value is [Destination]: a list of addresses of all backends that correspond to specific hint and label which is respectively keys of the outer and the middle Maps. As soon as Possession keys are always sorted with Own as the first element (as it should derive Ord expectedly), the own media managers will always be the first in the search hierarchy. The hints are unrelated for /read requests whereas labels are unrelated for /write requests, that's why a special tagged AnyHint is used for the read part of the returned Routes tuple, and value "any" is used as the label for the write part of the tuple. Below is the definition of TaggedHint.

data TaggedHint = AnyHint
                | PlainHint Hint
                deriving Eq
instance Show TaggedHint where
    show AnyHint       = "any"
    show (PlainHint x) = x

So, the hint in the search hierarchy of Routes in the read part will also look as "any". After making newRoutes in queryEndpoints we firstly check if it's equal to the stripped value held in the second element of a tuple from the global reference routes defined as

type RRBackends = Map Possession ([Destination], Maybe (RoundRobin Int))
type RRRoutes = Map Hint (Map Label RRBackends)
type SeqNumber = Integer

routes :: IORef ((SeqNumber, (RRRoutes, RRRoutes)),
                 (SeqNumber, (RRRoutes, RRRoutes)))
routes = unsafePerformIO $ newIORef ((-1, (M.empty, M.empty)),
                                     (0,  (M.empty, M.empty)))
{-# NOINLINE routes #-}

(type RoundRobin is imported from Data.RoundRobin of package roundRobin). The routes' tuple holds two elements with search hierarchies enriched with round-robin elements: the old and the current. The old value is held to avoid loss of the search path in a request currently being processed when the search hierarchy gets altered after an asynchronous update in queryEndpoints: each request gets tagged with the SeqNumber from the current element of the tuple and is supposed to survive two sequential updates of the search hierarchy (if by some reason a request fails to trace down all the search path until a successful response and the element with its SeqNumber in the search hierarchy has been already rewritten then it gets finalized with writeFinalMsg in getMsg, see below). Let's get back to the action in queryEndpoints. If newRoutes and the stripped (i.e. without round-robin elements) second tuple of the routes differ then the second element of routes (i.e. current routes) gets moved to the first place (it becomes old) and the enriched copy of newRoutes gets put on the second place. Stripping enriched routes is performed in function fromRRRoutes.

fromRRRoutes :: (RRRoutes, RRRoutes) -> (Routes, Routes)
fromRRRoutes = both $ M.map $ M.map $ M.map fst

Enriching plain routes with round-robin elements is performed in function mkRRRoutes.

bothKleisli :: Monad m => (a -> m b) -> (a, a) -> m (b, b)
bothKleisli = runKleisli . both . Kleisli

mkRRRoutes :: (Routes, Routes) -> IO (RRRoutes, RRRoutes)
mkRRRoutes = bothKleisli $ mapM $ mapM $ mapM $
    runKleisli $
        arr id &&&
        Kleisli (
                    maybe (return Nothing) (fmap Just) .
                        (nonEmpty . flip take [1 ..] . length >=> ckLen >=>
                            Just . newRoundRobin)
                )
    where ckLen x@(_ :| _ : _) = Just x
          ckLen _ = Nothing

So, if the innermost Map's value of the Routes, i.e. the list of backends contains N elements and N is more than one then it gets bound to a RoundRobin element with a reference to N indexes starting from 1. The last things that I want to say about queryEndpoints is that the HTTP engine httpLbs used in it was taken from package http-client. It means that all requests are asynchronous and with adjustable timeouts. All exceptions that function query may produce are properly caught with functions catchBadResponseOwn and catchBadResponseRemote: they never return broken data replacing it with the existing good data instead. Now let's get back to the mysterious and frightening nginx directives haskell_var_nocacheable and haskell_var_compensate_uri_changes from file nginx-lmr-http-rules.conf and show content of file nginx-lmr-server-rules.conf.

set $debug 1;

location ~ '^/(Read|Write)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)$' {
    internal;

    set $op      $1;
    set $hint    $2;
    set $label   $3;
    set $seqn    $4;
    set $key     $5;
    set $start   $6;
    set $idx     $7;
    set $backend $8;
    set $stat    $9;

    set $head /$op/$hint/$label;

    recursive_error_pages on;
    error_page 404 =
        $head/$hs_seqn/$hs_key/$hs_start/$hs_idx/$hs_backend/NotFound;
    error_page 502 503 504 =
        $head/$hs_seqn/$hs_key/$hs_start/$hs_idx/$hs_backend/NotAccessible;
    error_page 409 =503 @Stop;

    proxy_intercept_errors on;

    haskell_run getMsg $hs_msg
        'Msg { op = $op
             , hnt = "$hint"
             , label = "$label"
             , seqn = $seqn
             , key = $key
             , start = $start
             , idx = $idx
             , backend = "$backend"
             , status = $stat
             }';

    haskell_run getSeqn    $hs_seqn    $hs_msg;
    haskell_run getKey     $hs_key     $hs_msg;
    haskell_run getStart   $hs_start   $hs_msg;
    haskell_run getIdx     $hs_idx     $hs_msg;
    haskell_run getBackend $hs_backend $hs_msg;
    haskell_run getStatus  $hs_status  $hs_msg;

    set $debugFinalMsg $debug::$hs_status;

    if ($debugFinalMsg = 1::NonExistent) {
        echo_status 503;
        echo "No route found";
        echo_after_body "[$pid] $hs_msg";
        break;
    }

    if ($hs_status = NonExistent) {
        return 409;
    }

    if ($debug) {
        echo_after_body "[$pid] $hs_msg";
        break;
    }

    proxy_pass http://$hs_backend$request_uri;
}

location @Stop {
    return 503;
}

location /backends {
    echo $hs_all_backends;
}

location /blacklist {
    haskell_run getBlacklist $hs_blacklist '';
    echo $hs_blacklist;
}

location /read {
    if ($arg_label = '') {
        return 400;
    }
    if ($arg_label ~ /) {
        return 400;
    }

    rewrite ^ /Read/any/$arg_label/0/0/0/0/START/Ok last;
}

location /write {
    if ($arg_hint ~ /) {
        return 400;
    }
    set $hint $arg_hint;
    if ($arg_hint = '') {
        set $hint default;
    }

    rewrite ^ /Write/$hint/any/0/0/0/0/START/Ok last;
}

location /conf {
    haskell_run getOwnBackends $hs_own_backends '';
    echo $hs_own_backends;
}

Both the /read and the /write requests get rewritten to a regexp location that starts with /Read or /Write, the rewritten request gets supplied with the hint and the label with four zeros and the mysterious tail /START/Ok. What going on here is starting of the loop where suitable media managers will be found or not found and the request's operation (read or write) will be successfully executed or failed! The four zeros are starting values for the loop: sequential number, key — index of the current search position in the innermost Map of type Routes, i.e. Possession or distance, start — index that will be assigned by the round-robin element from the RRRoutes on the first try if number of suitable media managers is more than one, and index — the current offset from the start value that will also be assigned from within the haskell module. The START value is rather arbitrary: this is an initial value of the backend to proxy_pass to and will be assigned from the haskell part, Ok corresponds to the internal state of the loop. The loop itself is encoded in the regexp location. Ability to iterate over the loop is the result of internal redirections provided by error_page directives which redirect to the same location when backends respond with bad statuses. But internal redirections in nginx have shortcomings: values of variables can be either cached forever or the variable's handler will be accessed on every access to the variable. We need somewhat in the middle: caching variables within a single internal redirection and updating them on every new internal redirection. This is exactly what directive haskell_var_nocacheable does! Another shortcoming of the nginx internal redirections is their finite number (10 or so): there is an internal counter uri_changes bound to the request's context, it gets decremented on every new internal redirection and when it reaches value 0 nginx finalizes the request. We cannot say how many internal redirections we need: apparently 10 can be not enough, and static limit is not good for us at all. Directive haskell_var_compensate_uri_changes makes the variable's handler increment (if possible) the request's uri_change value. As soon as variable $hs_msg is also nocacheable, its handler will be accessed once per single internal redirection thus compensating uri_changes decrements. Thus, the two directives haskell_var_nocacheable and haskell_var_compensate_uri_changes can make error_page internal redirections Turing-complete! New parameters of the loop including the most important one — the backend where to proxy_pass to, are received via a synchronous call to a haskell handler getMsg. Function getMsg is wrapped in IO Monad, however it does not make unsafe or unpredictable side effects like reading and writing from network sockets or files: it only accesses round-robin elements in the routes data and may also alter the global reference blacklist. This means that we can safely regard getMsg as safe in the sense of synchronicity. Here is how getMsg defined.

readMsg = readDef (Msg Read "" "" 0 0 0 0 "" NotReadable) . C8.unpack

writeMsg = return . C8L.pack . show

writeFinalMsg m = writeMsg m { backend = "STOP", status = NonExistent }

getMsg (readMsg -> m@Msg { status = NotReadable }) =
    writeFinalMsg m
getMsg (readMsg -> m@(Msg op hnt label seqn key start idx b st)) = do
    when (st == NotAccessible) $ do
        bl <- readIORef blacklist
        when (b `M.notMember` bl) $
            getCurrentTime >>= modifyIORef' blacklist . M.insert b
    (getRoutes seqn >=> return . rSelect op >=>
        return . second (M.lookup hnt >=> M.lookup label) -> r) <-
            readIORef routes
    case r >>= \x@(_, d) -> (x, ) <$> (d >>= elemAt key) of
        Nothing -> writeFinalMsg m
        Just ((n, fromJust -> d), (_, gr)) -> do
            (s, i, b) <- if st == NotFound
                             then return (0, 0, Nothing)
                             else getNextInGroup start
                                      (advanceIdx start st idx) gr
            case b of
                Nothing -> do
                    ((s, i, b), k) <- getNext (key + 1) d
                    case b of
                        Nothing -> do
                            unblacklistAll d
                            writeFinalMsg m { seqn = n }
                        Just v -> writeMsg $ Msg op hnt label n k s i v Ok
                Just v -> writeMsg $ Msg op hnt label n key s i v Ok
    where rSelect Read  = second fst
          rSelect Write = second snd
          getRoutes v (a@(x, _), b@(y, _)) | v == 0 = Just b
                                           | v == y = Just b
                                           | v == x = Just a
                                           | otherwise = Nothing
          elemAt i m | i < M.size m = Just $ M.elemAt i m
                     | otherwise = Nothing
          getNextInGroup _ 1 (_, Nothing) =
              return (0, 0, Nothing)
          getNextInGroup _ _ (dst, Nothing) =
              (0, 0, ) <$> ckBl (head dst)  -- using head is safe here
                                            -- because dst cannot be []
          getNextInGroup s i (dst, Just rr) = do
              ns <- if s == 0 then select rr else return s
              ((i +) . length -> ni, headDef Nothing -> d) <-
                  span isNothing <$> mapM ckBl
                        (take (length dst - i) $ drop (ns - 1 + i) $ cycle dst)
              return (ns, ni, d)
          advanceIdx 0 Ok = const 0
          advanceIdx 0 _  = const 1
          advanceIdx _ _  = succ
          ckBl d = do
              bl <- readIORef blacklist
              case M.lookup d bl of
                  Nothing -> return $ Just d
                  Just t -> do
                      now <- getCurrentTime
                      (fromIntegral -> bli) <- readIORef blInterval
                      if diffUTCTime now t > bli
                          then do
                              modifyIORef' blacklist $ M.delete d
                              return $ Just d
                          else return Nothing
          getNext k d = do
              (length -> nk, headDef (0, 0, Nothing) -> d) <-
                  span (\(_, _, b) -> isNothing b) <$>
                      mapM (getNextInGroup 0 0) (M.elems $ M.drop k d)
              return (d, k + nk)
          unblacklistAll = mapM_ $
              mapM_ (modifyIORef' blacklist . M.delete) . fst
ngxExportIOYY 'getMsg

The function tries to find a backend in the routes data according to received data: hint, label, sequential number etc. In case of any errors it writeFinalMsg and the nginx loop finishes. The data is sent from the nginx part as a message of type Msg and returned back from getMsg in the same type.

data BackendStatus = Ok             -- In / Out
                   | NotFound       -- In
                   | NotAccessible  -- In
                   | NotReadable    -- In
                   | NonExistent    -- Out
                   deriving (Read, Show, Eq)

data Msg = Msg { op      :: Op
               , hnt     :: Hint
               , label   :: Label
               , seqn    :: SeqNumber
               , key     :: Int
               , start   :: Int
               , idx     :: Int
               , backend :: Destination
               , status  :: BackendStatus
               } deriving (Read, Show)

The nginx part extracts separate values from the message using special getters.

getSeqn = C8L.pack . show . seqn . readMsg
ngxExportYY 'getSeqn

getKey = C8L.pack . show . key . readMsg
ngxExportYY 'getKey

getStart = C8L.pack . show . start . readMsg
ngxExportYY 'getStart

getIdx = C8L.pack . show . idx . readMsg
ngxExportYY 'getIdx

getBackend = C8L.pack . backend . readMsg
ngxExportYY 'getBackend

getStatus = C8L.pack . show . status . readMsg
ngxExportYY 'getStatus

Locations /conf and /blacklist require additional getters.

getOwnBackends = const $
    encode . fromMaybe M.empty . M.lookup Own <$> readIORef allBackends
ngxExportIOYY 'getOwnBackends

getBlacklist = const $
    encode <$> readIORef blacklist
ngxExportIOYY 'getBlacklist

I won't explain how getMsg works in details: it must be simple. Instead, I'll move to the module's compilation requirements and then make some tests in an environment that emulates what was depicted in the image from the beginning of the article. The requirements of lmr.hs are: ghc-8.0.1 or higher, containers-0.5.8.1 or higher, modules bytestring, async, aeson, http-client, roundRobin, safe and ngx-export. They all can be installed with cabal, but make sure that all depended modules were (re)installed with appropriate containers module's version. The lmr.hs can be compiled with command

ghc -O2 -dynamic -shared -fPIC -lHSrts_thr-ghc$(ghc --numeric-version) lmr.hs -o lmr.so -fforce-recomp

ghc -O2 -dynamic -shared -fPIC -lHSrts_thr_debug-ghc$(ghc --numeric-version) lmr.hs -o lmr.so -fforce-recomp -eventlog

if you want to collect haskell events to analyze performance and GC further (in this case you must uncomment the line with directive haskell rts_options -l in file nginx-lmr-http-rules.conf and make sure that the user of an nginx worker process, i.e. nginx, may write into the current directory). My nginx build configuration is

/usr/local/nginx/sbin/nginx -V
nginx version: nginx/1.10.2
built by gcc 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC) 
configure arguments: --add-module=/home/lyokha/devel/nginx-haskell-module --add-module=../echo-nginx-module-0.60

Let's emulate media managers in a dedicated nginx configuration file nginx-lmr-backends.conf.

user                    nginx;
worker_processes        2;

events {
    worker_connections  1024;
}

http {
    default_type        application/octet-stream;
    sendfile            on;

    server {
        listen       8011;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            #echo "In 8011";
            return 404;
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_1" : {
                      "mode": "RW",
                      "hint": ["default"]
                    },
                    "dir_3" : {
                      "mode": "RO",
                      "hint": ["default", "fast"]
                    }
                  }
                }
            ';
        }
    }

    server {
        listen       8012;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            #echo "In 8012";
            return 404;
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_1" : {
                      "mode": "RO",
                      "hint": ["default"]
                    },
                    "dir_2" : {
                      "mode": "RW",
                      "hint": ["default", "fast"]
                    }
                  }
                }
            ';
        }
    }

    server {
        listen       8013;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            echo "In 8013";
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_4" : {
                      "mode": "RW",
                      "hint": ["default"]
                    },
                    "dir_3" : {
                      "mode": "RO",
                      "hint": ["default", "fast"]
                    },
                    "dir_10" : {
                      "mode": "RO",
                      "hint": ["default", "fast"]
                    }
                  }
                }
            ';
        }
    }

    server {
        listen       8021;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            echo "In 8021";
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_1" : {
                      "mode": "RW",
                      "hint": ["default"]
                    },
                    "dir_2" : {
                      "mode": "RW",
                      "hint": ["default", "fast"]
                    },
                    "dir_5" : {
                      "mode": "RW",
                      "hint": ["default"]
                    }
                  }
                }
            ';
        }
    }

    server {
        listen       8022;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            echo "In 8022";
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_1" : {
                      "mode": "RW",
                      "hint": ["default"]
                    },
                    "dir_2" : {
                      "mode": "RW",
                      "hint": ["default", "fast"]
                    }
                  }
                }
            ';
        }
    }

    server {
        listen       8031;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            echo "In 8031";
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_3" : {
                      "mode": "RW",
                      "hint": ["default"]
                    },
                    "dir_2" : {
                      "mode": "RO",
                      "hint": ["default", "fast"]
                    },
                    "dir_4" : {
                      "mode": "RW",
                      "hint": ["default"]
                    }
                  }
                }
            ';
        }
    }

    server {
        listen       8032;
        server_name  backend;

        location ~ ^/(?:read|write)$ {
            echo "In 8032";
        }

        location /conf {
            echo
            '
                {
                  "timestamp": 1423432,
                  "labels": {
                    "dir_10" : {
                      "mode": "RW",
                      "hint": ["default", "fast"]
                    }
                  }
                }
            ';
        }
    }
}

Media managers located by port numbers 8011, 8012 and 8013 are LMM 1, LMM 2 and LMM 3 in terms of the image, 8021 and 8022 are LMM 4 and LMM 5, 8031 and 8032 are LMM 6 and LMM 7. Read and write requests to them return HTTP status 200 and a short message about their location, except for 8011 and 8012 which return status 404. Let the first nginx configuration shown in the article (with data Conf) be the configuration for LMR 1 — nginx-lmr-8010.conf. We also need configurations for LMR 2 and LMR 3 — nginx-lmr-8020.conf and nginx-lmr-8030.conf that are clones of nginx-lmr-8010.conf with correctly adjusted values of partners and backends. As a superuser move lmr.so to directory /var/lib/nginx and start nginx.

cp -i lmr.so /var/lib/nginx
/usr/local/nginx/sbin/nginx -c /home/lyokha/devel/nginx-haskell-module/examples/labeledMediaRouting/nginx-lmr-8010.conf

(Here paths on my computer are shown, they may differ on yours). Let' look at configuration of our media router.

curl -s 'http://localhost:8010/backends' | jq .
[
  {},
  {}
]
curl -s 'http://localhost:8010/conf' | jq .
{
  "127.0.0.1:8011": {
    "labels": {},
    "timestamp": 0
  },
  "127.0.0.1:8012": {
    "labels": {},
    "timestamp": 0
  },
  "127.0.0.1:8013": {
    "labels": {},
    "timestamp": 0
  }
}

(Command jq is an excellent command-line JSON parser that can pretty-print and colorize JSON objects.) Here we see that the both parts of the routes are empty and all the 3 media managers have zero configuration. Anyway, let's try to read from yet inexistent dir_1.

curl -D- 'http://localhost:8010/read?label=dir_1'
HTTP/1.1 503 Service Temporarily Unavailable
Server: nginx/1.10.2
Date: Wed, 04 Jan 2017 14:56:40 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Connection: keep-alive

No route found
[28548] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 0, key = 0, start = 0, idx = 0, backend = "STOP", status = NonExistent}

Good, No routes found as expected, there is also a debug message with the worker's PID and data Msg returned by function getMsg. Let's start the backends and wait at most 20 sec until the media router find them.

/usr/local/nginx/sbin/nginx -c /home/lyokha/devel/nginx-haskell-module/examples/labeledMediaRouting/nginx-lmr-backends.conf

Waiting...

curl -s 'http://localhost:8010/backends' | jq .
[
  {
    "any": {
      "dir_1": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8011",
            "127.0.0.1:8012"
          ]
        ]
      ],
      "dir_10": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8013"
          ]
        ]
      ],
      "dir_2": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8012"
          ]
        ]
      ],
      "dir_3": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8011",
            "127.0.0.1:8013"
          ]
        ]
      ],
      "dir_4": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8013"
          ]
        ]
      ]
    }
  },
  {
    "default": {
      "any": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8011",
            "127.0.0.1:8012",
            "127.0.0.1:8013"
          ]
        ]
      ]
    },
    "fast": {
      "any": [
        [
          {
            "tag": "Own"
          },
          [
            "127.0.0.1:8012"
          ]
        ]
      ]
    }
  }
]
curl -s 'http://localhost:8010/conf' | jq .
{
  "127.0.0.1:8011": {
    "labels": {
      "dir_1": {
        "mode": "RW",
        "hint": [
          "default"
        ]
      },
      "dir_3": {
        "mode": "RO",
        "hint": [
          "default",
          "fast"
        ]
      }
    },
    "timestamp": 1423432
  },
  "127.0.0.1:8012": {
    "labels": {
      "dir_1": {
        "mode": "RO",
        "hint": [
          "default"
        ]
      },
      "dir_2": {
        "mode": "RW",
        "hint": [
          "default",
          "fast"
        ]
      }
    },
    "timestamp": 1423432
  },
  "127.0.0.1:8013": {
    "labels": {
      "dir_4": {
        "mode": "RW",
        "hint": [
          "default"
        ]
      },
      "dir_3": {
        "mode": "RO",
        "hint": [
          "default",
          "fast"
        ]
      },
      "dir_10": {
        "mode": "RO",
        "hint": [
          "default",
          "fast"
        ]
      }
    },
    "timestamp": 1423432
  }
}

Ouch! Much longer output, and we actually see the structure of LMM 1, LMM 2 and LMM 3 from the image and the media storages behind them. Remember that 8011 and 8012 return status 404?

curl -D- 'http://localhost:8010/read?label=dir_1'
HTTP/1.1 503 Service Temporarily Unavailable
Server: nginx/1.10.2
Date: Wed, 04 Jan 2017 15:09:18 GMT
Content-Type: application/octet-stream
Transfer-Encoding: chunked
Connection: keep-alive

No route found
[28548] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 1, key = 0, start = 1, idx = 0, backend = "STOP", status = NonExistent}

What about simple dir_10 and dir_3 with 2 media managers 8011 and 8013?

curl 'http://localhost:8010/read?label=dir_10'
In 8013
[28548] Msg {op = Read, hnt = "any", label = "dir_10", seqn = 1, key = 0, start = 0, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_10'
In 8013
[28548] Msg {op = Read, hnt = "any", label = "dir_10", seqn = 1, key = 0, start = 0, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_10'
In 8013
[28548] Msg {op = Read, hnt = "any", label = "dir_10", seqn = 1, key = 0, start = 0, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_3'
No route found
[28548] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 1, idx = 0, backend = "STOP", status = NonExistent}
curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28548] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_3'
No route found
[28548] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 1, idx = 0, backend = "STOP", status = NonExistent}
curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28548] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8013", status = Ok}

(I no longer print HTTP headers to make the output more compact.) Label dir_10 behaves as expected, but label dir_3 gets found and then not found in cycle. Is this what we expected? Yes! Because our backend is actually misconfigured. Remember that when a backend return status 404 we expect that any other backend from this area (i.e. with specific Possession key) will also return 404 and so we skip to another area. In this situation we have a single area and a directory dir_3 with 2 backends: it means that they are chosen in round-robin manner (cycling of start value in the debug message witnesses that). When backend 8011 gets chosen it returns 404 and getMsg skips to the next (inexistent) area without checking other backends in this area, finally emitting the final message with status NonExistent. When the other backend 8013 gets chosen it returns good message In 8013. In the next request the round-robin mechanism choses backend 8011 and so on. Let 8011 return 503: this must fix the issue and the 8013 must be chosen every time with blacklisting of 8011. Replace the line with return 404 in the read/write location of server 8011 in file nginx-lmr-backends.conf with return 503 and restart backends only.

pkill -HUP -f '/usr/local/nginx/sbin/nginx.*nginx-lmr-backends.conf'

The behavior of the dir_3 must alter immediately because the change of the returned status did not change the structure of the routes and hence there is no need to wait.

curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28547] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 1, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28547] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28547] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 1, idx = 1, backend = "127.0.0.1:8013", status = Ok}

Correct. Reaching by idx value 1 prompts us that there are blacklisted backends (8011?). Check it.

curl -s 'http://localhost:8010/blacklist' | jq .
{
  "127.0.0.1:8011": "2017-01-04T15:46:03.805737876Z"
}

Let's make 8011 return a good message with status 200. For that, uncomment the line with In 8011 in file nginx-lmr-backends.conf and comment the next line with return 503 (that was return 404 in the beginning). Restart backends only (with the specially crafted signal -HUP as was shown above), wait 1 min (or nothing if the request come to the unaffected nginx worker) and test dir_3 again.

curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28547] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl 'http://localhost:8010/read?label=dir_3'
In 8011
[28547] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 1, idx = 0, backend = "127.0.0.1:8011", status = Ok}
curl 'http://localhost:8010/read?label=dir_3'
In 8013
[28547] Msg {op = Read, hnt = "any", label = "dir_3", seqn = 1, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8013", status = Ok}
curl -s 'http://localhost:8010/blacklist' | jq .
{}

Finally we get a healthy round-robin cycle with the empty blacklist. Let's now undo all changes in backends and start the other 2 media routers.

pkill -HUP -f '/usr/local/nginx/sbin/nginx.*nginx-lmr-backends.conf'
/usr/local/nginx/sbin/nginx -c /home/lyokha/devel/nginx-haskell-module/examples/labeledMediaRouting/nginx-lmr-8020.conf 
/usr/local/nginx/sbin/nginx -c /home/lyokha/devel/nginx-haskell-module/examples/labeledMediaRouting/nginx-lmr-8030.conf

Now they'll start to negotiate with each other every 20 sec per router. You will see that when tailing -f file /var/log/nginx/lmr-access.log or in a sniffer output. Let's look how they see dir_1 for reading requests.

curl -s 'http://localhost:8010/backends' | jq .[0].any.dir_1
[
  [
    {
      "tag": "Own"
    },
    [
      "127.0.0.1:8011",
      "127.0.0.1:8012"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8020"
    },
    [
      "127.0.0.1:8021",
      "127.0.0.1:8022"
    ]
  ]
]
curl -s 'http://localhost:8020/backends' | jq .[0].any.dir_1
[
  [
    {
      "tag": "Own"
    },
    [
      "127.0.0.1:8021",
      "127.0.0.1:8022"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8010"
    },
    [
      "127.0.0.1:8011",
      "127.0.0.1:8012"
    ]
  ]
]
curl -s 'http://localhost:8030/backends' | jq .[0].any.dir_1
[
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8010"
    },
    [
      "127.0.0.1:8011",
      "127.0.0.1:8012"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8020"
    },
    [
      "127.0.0.1:8021",
      "127.0.0.1:8022"
    ]
  ]
]

As soon as backends 8011 and 8012 return 404 they all must behave similarly on the read requests.

curl 'http://localhost:8010/read?label=dir_1'
In 8021
[28547] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 2, key = 1, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8010/read?label=dir_1'
In 8022
[28547] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 2, key = 1, start = 2, idx = 0, backend = "127.0.0.1:8022", status = Ok}
curl 'http://localhost:8010/read?label=dir_1'
In 8021
[28547] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 2, key = 1, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8020/read?label=dir_1'
In 8021
[18664] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 2, key = 0, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8020/read?label=dir_1'
In 8022
[18664] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 2, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8022", status = Ok}
curl 'http://localhost:8020/read?label=dir_1'
In 8021
[18664] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 2, key = 0, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8030/read?label=dir_1'
In 8021
[18701] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 1, key = 1, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8030/read?label=dir_1'
In 8022
[18701] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 1, key = 1, start = 2, idx = 0, backend = "127.0.0.1:8022", status = Ok}
curl 'http://localhost:8030/read?label=dir_1'
In 8021
[18701] Msg {op = Read, hnt = "any", label = "dir_1", seqn = 1, key = 1, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}

Label dir_2 must look differently.

curl -s 'http://localhost:8010/backends' | jq .[0].any.dir_2
[
  [
    {
      "tag": "Own"
    },
    [
      "127.0.0.1:8012"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8020"
    },
    [
      "127.0.0.1:8021",
      "127.0.0.1:8022"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8030"
    },
    [
      "127.0.0.1:8031"
    ]
  ]
]
curl -s 'http://localhost:8020/backends' | jq .[0].any.dir_2
[
  [
    {
      "tag": "Own"
    },
    [
      "127.0.0.1:8021",
      "127.0.0.1:8022"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8010"
    },
    [
      "127.0.0.1:8012"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8030"
    },
    [
      "127.0.0.1:8031"
    ]
  ]
]
curl -s 'http://localhost:8030/backends' | jq .[0].any.dir_2
[
  [
    {
      "tag": "Own"
    },
    [
      "127.0.0.1:8031"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8010"
    },
    [
      "127.0.0.1:8012"
    ]
  ],
  [
    {
      "tag": "Remote",
      "contents": "127.0.0.1:8020"
    },
    [
      "127.0.0.1:8021",
      "127.0.0.1:8022"
    ]
  ]
]

Accessing dir_2 from routers 8010 and 8020 must return In 8021 and In 8022 whereas 8030 must always return In 8031. Check it.

curl 'http://localhost:8010/read?label=dir_2'
In 8021
[28548] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 2, key = 1, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8010/read?label=dir_2'
In 8022
[28548] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 2, key = 1, start = 2, idx = 0, backend = "127.0.0.1:8022", status = Ok}
curl 'http://localhost:8010/read?label=dir_2'
In 8021
[28548] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 2, key = 1, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8020/read?label=dir_2'
In 8022
[18662] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 2, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8022", status = Ok}
curl 'http://localhost:8020/read?label=dir_2'
In 8021
[18662] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 2, key = 0, start = 1, idx = 0, backend = "127.0.0.1:8021", status = Ok}
curl 'http://localhost:8020/read?label=dir_2'
In 8022
[18662] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 2, key = 0, start = 2, idx = 0, backend = "127.0.0.1:8022", status = Ok}
curl 'http://localhost:8030/read?label=dir_2'
In 8031
[18703] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 1, key = 0, start = 0, idx = 0, backend = "127.0.0.1:8031", status = Ok}
curl 'http://localhost:8030/read?label=dir_2'
In 8031
[18703] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 1, key = 0, start = 0, idx = 0, backend = "127.0.0.1:8031", status = Ok}
curl 'http://localhost:8030/read?label=dir_2'
In 8031
[18703] Msg {op = Read, hnt = "any", label = "dir_2", seqn = 1, key = 0, start = 0, idx = 0, backend = "127.0.0.1:8031", status = Ok}

By the way you may notice that requests to the routers 8010 and 8020 receive sequential number 2 from getMsg whereas requests to 8030 receive 1. This is not a surprise: we started 8020 and 8030 after 8010 and it replaced its configuration only once. The router 8020 was very quick to initialize its configuration before it was changed again by starting of the 8030. The 8030 was starting after the others and its configuration did not change since then. We still did not test several important scenarios here, for example addition of new labels and replacing timestamps in /conf responses from media managers, but I won't do that: the article is already too big and the shown tests do enough to give a taste of the proposed routing approach. The source code and nginx configuration files for the tests can be found here.

понедельник, 14 августа 2017 г.