Once arbtt-capture is running, it will record data without any configuration. And only to analyze the recorded data, one needs to configure the categorizer. Everytime the categorizer (arbtt-stats) runs, it applies categorization rules to all recorded data and tags it accordingly. Thus, if you improve your categorization rules later, they will apply also to all previous data samples!
The configuration file needs to be placed in
~/.arbtt/categorize.cfg
. An
example is included in the source distribution, and it is reproduced here:
see Example 1, “categorize.cfg
”.
It should be more enlightening than a formal description.
Example 1. categorize.cfg
-- -*- mode: haskell; -*- -- Comments in this file use the Haskell syntax: -- A "--" comments the rest of the line. -- A set of {- ... -} comments out a group of lines. -- This defines some aliases, to make the reports look nicer: aliases ( "sun-awt-X11-XFramePeer" -> "java", "sun-awt-X11-XDialogPeer" -> "java", "sun-awt-X11-XWindowPeer" -> "java", "gramps.py" -> "gramps", "___nforschung" -> "ahnenforschung", "Pidgin" -> "pidgin" ) -- A rule that probably everybody wants. Being inactive for over a minute -- causes this sample to be ignored by default. $idle > 60 ==> tag inactive, -- A rule that matches on a list of strings current window $program == ["Navigator","galeon"] ==> tag Web, -- Use condition bindings to reduce duplication condition isJava = current window $program == ["sun-awt-X11-XFramePeer", "sun-awt-X11-XDialogPeer", "sun-awt-X11-XWindowPeer"] in $isJava && current window $title == "I3P" ==> tag Program:I3P, current window $program == "sun-awt-X11-XDialogPeer" && current window $title == " " && any window $title == "I3P" ==> tag Program:I3P, -- Simple rule that just tags the current program tag Program:$current.program, -- Another simple rule, just tags the current desktop (a.k.a. workspace) tag Desktop:$desktop, -- I'd like to know what evolution folders I'm working in. But when sending a -- mail, the window title only contains the (not very helpful) subject. So I do -- not tag necessarily by the active window title, but the title that contains -- the folder current window $program == "evolution" && any window ($program == "evolution" && $title =~ /^(.*) \([0-9]+/) ==> tag Evo-Folder:$1, -- A general rule that works well with gvim and gnome-terminal and tells me -- what project I'm currently working on current window $title =~ m!(?:~|home/jojo)/projekte/(?:programming/(?:haskell/)?)?([^/)]*)! ==> tag Project:$1, current window $title =~ m!(?:~|home/jojo)/debian! ==> tag Project:Debian, -- This was a frequently looked-at pdf-File current window $title =~ m!output.pdf! && any window ($title =~ /nforschung/) ==> tag Project:ahnenforschung, -- My diploma thesis is in a different directory current window $title =~ [ m!(?:~|home/jojo)/dokumente/Uni/DA! , m!Diplomarbeit.pdf! , m!LoopSubgroupPaper.pdf! ] ==> tag Project:DA, current window $title =~ m!TDM! ==> tag Project:TDM, ( $date >= 2010-08-01 && $date <= 2010-12-01 && ( current window $program == "sun-awt-X11-XFramePeer" && current window $title == "I3P" || current window $program == "sun-awt-X11-XDialogPeer" && current window $title == " " && any window $title == "I3P" || current window $title =~ m!(?:~|home/jojo)/dokumente/Uni/SA! || current window $title =~ m!Isabelle200! || current window $title =~ m!isar-ref.pdf! || current window $title =~ m!document.pdf! || current window $title =~ m!outline.pdf! || current window $title =~ m!Studienarbeit.pdf! ) ) ==> tag Project:SA, -- Out of curiosity: what percentage of my time am I actually coding Haskell? current window ($program == "gvim" && $title =~ /^[^ ]+\.hs \(/ ) ==> tag Editing-Haskell, {- -- Example of time-related rules. I do not use these myself. -- To be able to match on the time of day, I introduce tags for that as well. -- $time evaluates to local time. $time >= 2:00 && $time < 8:00 ==> tag time-of-day:night, $time >= 8:00 && $time < 12:00 ==> tag time-of-day:morning, $time >= 12:00 && $time < 14:00 ==> tag time-of-day:lunchtime, $time >= 14:00 && $time < 18:00 ==> tag time-of-day:afternoon, $time >= 18:00 && $time < 22:00 ==> tag time-of-day:evening, $time >= 22:00 || $time < 2:00 ==> tag time-of-day:late-evening, -- This tag always refers to the last 24h $sampleage <= 24:00 ==> tag last-day, -- To categorize by calendar periods (months, weeks, or arbitrary periods), -- I use $date variable, and some auxiliary functions. All these functions -- evaluate dates in local time. Set TZ environment variable if you need -- statistics in a different time zone. -- You can compare dates: $date >= 2001-01-01 ==> tag this_century, -- You have to write them in YYYY-MM-DD format, else they will not be recognized. -- “format $date” produces a string with the date in ISO 8601 format -- (YYYY-MM-DD), it may be compared with strings. For example, to match -- everything on and after a particular date I can use format $date =~ /.*-03-19/ ==> tag period:on_a_special_day, -- but note that this is a rather expensive operation and will slow down your -- data processing considerably. -- “day of month $date” gives the day of month (1..31), -- “day of week $date” gives a sequence number of the day of week -- (1..7, Monday is 1): (day of month $date == 13) && (day of week $date == 5) ==> tag day:friday_13, -- “month $date” gives a month number (1..12), “year $date” gives a year: month $date == 1 ==> tag month:January, month $date == 2 ==> tag month:February, year $date == 2010 ==> tag year:2010, -- “$now” evaluates to the current time day of month $now == day of month $date ==> tag current-day, month $now == month $date ==> tag current-month, year $now == year $date ==> tag current-year, -}
A data sample consists of the time of recording, the time passed since the user’s last action, the name of the current workspace and the list of windows. For each window this information is available:
Based on this information and on the rules in
categorize.cfg
, the categorizer
(arbtt-stats) assigns tags to
each sample.
A simple rule consists of a condition followed by an arrow
(==>
) and a tag expression
(tag
keyword followed by tag name).
The rule ends with a coma (,
).
The keyword tag
, usually preceded with a condition,
assigns a tag to the sample; tag
keyword is followed by a tag name (any sequence of alphanumeric symbols,
underscores and hyphens). If tag name contains a colon
(:
), the first part of the name before the colon, is
considered to be tag category.
For example, this rule
month $date == 1 ==> tag month:January,
if it succeeds, assigns a the tag January
in the
category month
.
If the tag has a category, it will only be
assigned if no other tag of that category has been assigned. This means
that for each sample and each category, there can be at most only one tag
in that category. Tags can contain references to group matches in the
regular expressions used in conditions ($1
,
$2
)...). Tags can also reference some
variables such as window title ($current.title
) or
program name ($current.program
).
The variable $idle
contains the idle time of the user,
measured in seconds. Usually, it is used to assign the tag
inactive
, which is handled specially by
arbtt-stats, as can be seen in Example 1, “categorize.cfg
”.
When applying the rules, the categorizer has a notion of
the window in scope, and the variables
$title
, $program
,
$wdesktop
, $active
and
$hidden
always refer to the window in scope.
By default, there is no window is in scope. Condition should be prefixed
with either current window
or any
window
, to define scope of these variables.
The name of the current desktop (or workspace) is available as
$desktop
.
For current window
, the currently active window is in
scope. If there is no such window, the condition is false.
For any window
, the condition is applied to each
window, in turn, and if any of the windows matches, the result is true. If
more than one window matches it is not defined from which match the
variables $1
... are taken from (see more about regular
expressions below).
The variable $time
refers to the time-of-day of the
sample (i.e. the time since 00:00 that day, local time), while
$sampleage
refers to the
time span from when the sample was recored until now, the time of
evaluating the statistics. The latter variable is especially useful when
passed to the --filter
option of
arbtt-stats. They can be compared with expressions
like "hh:mm", for example
$time >= 8:00 && $time < 12:00 ==> tag time-of-day:morning
The variable $date
refers to the date and time of the
recorded sample. It can be compared with date literals in the form
YYYY-MM-DD (which stand for midnight, so
$date == 2001-01-01
will not do what you want, but
$date >= 2001-01-01 && $date <= 2001-01-02
would). All dates are evaluated in local time.
Expression format $date
evaluates to a string with
a date formatted according to ISO 8601, i.e. like "YYYY-MM-DD". The 19th
of March 2010 is formatted as "2010-03-19". Formatted date can be compared
to strings. Formatted dates may be useful to tag particular date ranges. But
also note that this is a rather expensive operation that can slow down your
data processing.
Expression month $date
evaluates to an integer, from 1
to 12, corresponding to the month number. Expression year
$date
evaluates to an integer which is a year number.
Expression day of month $date
evaluates to an integer,
from 1 to 31, corresponding to the day of month.
Expression day of week $date
evaluates to an integer,
from 1 to 7, corresponding to the day of week, Monday is 1, Sunday is 7.
These expressions can be compared to integers.
Expression week of year $date
evaluates to an integer,
from 0 to 53, corresponding to the week of year. January 1 falls in week 0.
These expressions are integers, and can be combined and compared as such.
Expressions can be compared to literal values with ==
(equal), /=
(not equal), <
,
<=
, >=
,
>
operators. String expressions
($program
, $title
) can be matched
against regular expressions with =~
operator. With these
operators, the right hand side can be a comma-separated list of
literals enclosed in square brackets ([
..., ..., ]
), which
succeeds if any of them succeeds.
Integer expressions can be combined via +
(addition), -
(subtraction), *
(multiplication),
operators.
Regular expressions are written either between slashes
(/
regular expression /
),
or after a letter m
followed by any symbol
(m
c regular expression c, where c is any symbol).
The second appearance of that symbol ends the expression.
You can find both variants in Example 1, “categorize.cfg
”.
Complex conditions may be constructed from the simpler ones,
using Boolean AND (&&
), OR
(||
), and NOT (!
) functions and
parentheses.
You can define short-hand names for conditions using
condition
:
condition arbtt = current window $title =~ m/arbtt/ in { $arbtt && $time < 14:00 ==> tag arbtt-morning, $arbtt && $time > 14:00 ==> tag arbtt-afternoon }
Everything that is a valid condition is assignable and you can
reference bound variables in rules by prefixing them with a
dollar ($
).
categorize.cfg
is a plain text file.
Whitespace is insignificant and Haskell-style comments are allowed.
A formal grammar is provided in Figure 1, “The formal grammar of categorize.cfg
”.
Figure 1. The formal grammar of categorize.cfg
|
A String
refers to a double-quoted string of
characters, while a Literal
is not quoted.
Tags may only consist of
letters, dashes and underscores, or variable interpolations. A Tag maybe
be optionally prepended with a category, separated by a colon. The
category itself follows he same lexical rules as the tag. A variable
interpolation can be one of the following:
$1
, $2
,...$current.title
, $current.program
A regular expression is, like in perl, either enclosed in forward
slashes or, alternatively, in any character of your choice with an
m
(for “match”) in front. This is handy if you need
to use regular expressions that match directory names. Otherwise, the
syntax of the regular expressions is that of perl-compatible regular
expressions.