Welcome!

Thank you for visiting the breakdance docs! Please let us know if you find any typos, mispellings or similar grammar mistakes as you are reading.

Pull requests are also appreciated.

Getting started

If you want to see how breakdance works, the CLI is the fastest way to get started.

Installing the CLI

Install the breakdance CLI with npm:

$ npm install  --global breakdance-cli

This adds the bd command to your system path, allowing you to run the breakdance CLI from any directory.

$ bd
# also aliased as "breakdance" in case of conflicts
$ breakdance

Using the CLI

If you want to do a quick test-drive, add the following content to foo.html:

<h2 id=tables-hover-rows>Hover rows</h2>
<p>Add <code>.table-hover</code> to enable a hover state on table rows within a <code>&lt;tbody&gt;</code>.</p>
<div class=bs-example data-example-id=hoverable-table>
  <table class="table table-hover">
    <thead>
      <tr>
        <th>#</th>
        <th>First Name</th>
        <th>Last Name</th>
        <th>Username</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th scope=row>1</th>
        <td>Mark</td>
        <td>Otto</td>
        <td>@mdo</td>
      </tr>
      <tr>
        <th scope=row>2</th>
        <td>Jacob</td>
        <td>Thornton</td>
        <td>@fat</td>
      </tr>
      <tr>
        <th scope=row>3</th>
        <td>Larry</td>
        <td>the Bird</td>
        <td>@twitter</td>
      </tr>
    </tbody>
  </table>
</div>

Run breakdance

Next, from the command line run:

$ bd foo.html

If everything is installed properly, you should now see a foo.md file in the current working directory with something like following:

## Hover rows

Add `.table-hover` to enable a hover state on table rows within a `<tbody>`.

| # | First Name | Last Name | Username |
| --- | --- | --- | --- |
| 1 | Mark | Otto | @mdo |
| 2 | Jacob | Thornton | @fat |
| 3 | Larry | the Bird | @twitter |

CLI options

Most of the breakdance options can be set from the command line. Play around with the options on the following help menu to customize output.

Help menu

Usage: $ breakdance [options] <file> <dest>

  file:  The HTML file to convert to markdown
  dest:  Name of the markdown file to create. By default
         the HTML filename is used with a .md extension

Options:

-c, --condense Collapse more than two newlines to only
               two newlines. Enabled by default
-d, --domain   Specify the root domain name to prefix onto
               "href" or "src" paths that do not star
-o, --omit     One or more tags to omit entirely from
               the HTML before converting to markdown.
-p, --pick     One or more tags to pick entirely from the
               HTML before converting to markdown.
--comments     Include HTML code comments in the generated
               markdown string. Disabled by default

Example

Tell breakdance to only render specific HTML elements by passing CSS selectors to the --pick flag (powered by cheerio).

For example, to only generate the <table> from the HTML in the previous example, you would run:

$ bd foo.html --pick table

Try it! It's addictive!

API Usage

Breakdance is a JavaScript-based node.js module that, once installed, can be added to any .js file using node's require() system by adding the following line of code:

var breakdance = require('breakdance');

That line of code exposes the "main export" of the breakdance library, which is a function that can used in one of the following ways:

  1. called with a string to convert HTML to markdown directly, or
  2. treated as a constructor function and instantiated if you need to register plugins first

Example 1: Direct usage

var breakdance = require('breakdance');
console.log(breakdance('<strong>The freaks come out at night!</strong>'));
//=> '**The freaks come out at night!**'

Example 2: Constructor function

This is useful when you need to register plugins first, or you want to modify the parser or compiler before rendering.

var Breakdance = require('breakdance');
var breakdance = new Breakdance()
  .use(foo())
  .use(bar())
  .use(baz());

console.log(breakdance.render('<strong>The freaks come out at night!</strong>'));
//=> '**The freaks come out at night!**'

Breakdance AST

In addition to registering plugins, when instantiating you can also directly access the breakdance parser and compiler.

Logging out the AST is a great way to learn how breakdance works, and will be helpful in understanding how to customize behavior if you want to write plugins.

var Breakdance = require('breakdance');
var breakdance = new Breakdance(/* options */);
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
console.log(ast);

var str = breakdance.compile(ast);
console.log(str);
//=> '**The Freaks Come Out at Night!**'

Setting options

A number of different options are available for configuring breakdance. If you need more than what the options provide, see the customizing breakdance section.

To set options, pass an object as the last argument to breakdance:

breakdance(string[, options]);

Example

var breakdance = require('breakdance');
var options = {domain: 'https://github.com'};
console.log(breakdance('<a href="/some-link"></a>', options));
//=> '[]

Options

comments

Type: boolean

Default: undefined

Include HTML code comments in the generated markdown string. Disabled by default.

console.log(breakdance('<strong>Foo</strong> <!-- bar -->', {comments: true}));
//=> '**Foo** <!-- bar -->'

condense

Type: boolean

Default: true

Collapse more than two newlines to only two newlines. Enabled by default.

domain

Type: string

Default: undefined

Specify the root domain name to prefix onto href or src paths that do not start with # or contain ://.

keepEmpty

Type: string|array

Default: undefined

Selective keep tags that are omitted by omitEmpty, so you don't need to redefine all of the omitted tags.

knownOnly

Type: boolean

Default: undefined

When true, breakdance will throw an error if any non-standard/custom HTML tags are encountered. If you find a tag that breakdance doesn't cover, but you think it should, please create an issue to let us know about it.

See the breakdance recipes for an example of how to add support for custom HTML elements.

leadingNewline

Type: boolean

Default: undefined

Add a newline at the beggining of the generated markdown string.

omit

Type: array|string

Default: One or more tags to omit entirely from the HTML before converting to markdown.

Example

Given the following HTML:

<em>Foo</em> <strong class="xyz">Bar</strong> <em class="xyz">Baz</em>

You can do the following to selectively omit tags:

console.log(breakdance(html))
//=> '_Foo_ **Bar** _Baz_'
console.log(breakdance(html, {omit: 'em'}))
//=> '**Bar**'
console.log(breakdance(html, {omit: 'em[class="xyz"]'}))
//=> '_Foo_ **Bar**'
console.log(breakdance(html, {omit: '[class="xyz"]'}))
//=> '_Foo_'
console.log(breakdance(html, {omit: 'em,strong'}))
//=> ''
console.log(breakdance(html, {omit: ['em', 'strong']}))
//=> ''

pick

Type: array|string

Default: One or more tags to pick entirely from the HTML before converting to markdown.

Example

Given the following HTML:

<em>Foo</em> <strong class="xyz">Bar</strong> <em class="xyz">Baz</em>

You can do the following to selectively pick tags:

console.log(breakdance(html, {pick: 'em'}))
//=> '_Foo_ _Baz_'
console.log(breakdance(html, {pick: 'em[class="xyz"]'}))
//=> '_Baz_'
console.log(breakdance(html, {pick: '[class="xyz"]'}))
//=> '**Bar** _Baz_'
console.log(breakdance(html, {pick: 'em,strong'}))
//=> '_Foo_ **Bar** _Baz_'
console.log(breakdance(html, {pick: ['em', 'strong']}))
//=> '_Foo_ **Bar** _Baz_'

omitEmpty

Type: array

Default: b, del, div, em, i, li, ol, s, span, strong, section, u, ul.

Array of tags to strip from the HTML when they contain only whitespace or nothing at all. Note then this option is defined, the parser must recurse over every child node to determine if the tag is empty, which can make parsing much slower (and is the reason an array of tag names is required, versus checking all nodes).

console.log(breakdance('...', {omitEmpty: ['div', 'span']}));

one

Type: boolean

Default: undefined

Use 1. for all ordered list items. Most markdown renderers automatically renumber lists. Using 1. for all items is an easy way of not having to renumber list items when one is added or removed.

Examples

Given the following HTML ordered list:

<ol>
  <li>Foo</li>
  <li>Bar</li>
  <li>Baz</li>
</ol>

When rendered:

console.log(breakdance(list));

Results in:

1. Foo
2. Bar
3. Baz

And if options.one is true:

console.log(breakdance(list, {one: true}));

The result is:

1. Foo
1. Bar
1. Baz

unsmarty

Type: boolean

Default: undefined

Convert smart quotes to regular quotes.

snapdragon

Type: object

Default: undefined

Pass your own instance of snapdragon. We're using snapdragon-cheerio to modify the cheerio AST to be compatible with Snapdragon, which consumes the AST and renders to markdown.

slugify

Type: function

Default: undefined

Pass a custom function for slugifying the url in anchors. By default, anchor urls are formatted to be consistent with how GitHub slugifies anchors.

title

Type: boolean

Default: undefined

Output the text in the <title> tag as an h1 at the start of the generated markdown document.

Example

console.log(breakdance('<title>Foo</title>', {title: true}));
//=> '# Foo\n'

trailingNewline

Type: boolean

Default: true

Add a newline at the end of the generated markdown string.

trailingWhitespace

Type: boolean

Default: true

Trim trailing non-newline whitespace from the string.

trim

Type: boolean

Default: true

Trim leading and trailing whitespace from the string.

url

Type: function

Default: undefined

Pass a function to use on all URLs that are process for the href attribute in <a> tags and the src attribute in <img> tags.

var markdown = breakdance(html, {
  url: function(str) {
    // do stuff to URL string
    return str;
  }
});
console.log(markdown);

whitespace

Type: boolean

Default: true

Normalize whitespace, courtesy of the [breakdance-whitespace][] plugin. If you don't like the default normalization, you can disable or override this via options, or write a custom plugin.

Disable whitespace handling

breakdance('<title>Foo</title>', {whitespace: false});

handlers

Type: object

Default: undefined

Pass a function with the name of the node type to options.handlers, to override the built-in handler for any node.type.

Examples

Customize whitespace handling by overriding the built-in text handler:

breakdance('<title>Foo</title>', {
  handlers: {
    text: function(node) {
      // do stuff to text node
    }
  }
});

Customize output of the <strong> handler:

breakdance('Foo <strong>Bar</strong>', {
  handlers: {
    strong: function(node) {
      this.emit(node.val.toUpperCase());
    }
  }
});

//=> 'Foo **BAR**'

Disable the strong handler (note that breakdance would throw an error if you encounter a <strong> tag and no handlers are registered for it):

breakdance('Foo <strong>Bar</strong>', {handlers: {strong: false}});
//=> 'Foo '

If you need to do something more advanced, or you want breakdance to always use your customizations, you can also write a plugin for this.

after

TODO

var md = breakdance(html, {
  after: {
    eos: function(node) {
      // do stuff after end-of-string
    }
  }
});

before

TODO

var md = breakdance(html, {
  before: {
    div: function(node) {
      if (node.isGrid) {
        this.emit(node.html, node);
        node.nodes = []
      }
    }
  }
});

preprocess

TODO

var md = breakdance(html, {
  preprocess: function($, node) {
  }
});

postprocess

TODO

Customizing

If you don't like the defaults or need more than what the options provide, that's okay, breakdance was made to be easy to customize:

  • Jump to the API section to learn how to hack on breakdance.
  • visit the plugin docs to learn how to find or write plugins.

API

This section describes the API methods exposed by breakdance. For you to get the most of out the documentation in this section, it might help to take a moment to learn about the core concepts around which these API methods are designed.

Hacking on breakdance

Override any defaults, or add support for custom elements, attributes or options. For example, you can control how any HTML element is converted to markdown, or even how a certain element with specific attributes is converted. You can override built-in renderers, create custom renderers for custom HTML tags, or create plugins that "bundle" together your commonly used customizations or preferences.

Breakdance

Create an instance of Breakdance with the given options.

Use this format to convert HTML to markdown.

var breakdance = require('breakdance');
var str = breakdance('<strong>Let\'s dance!');
//=> "**Let\'s dance**"

Or use this format if you need to use plugins

var Breakdance = require('breakdance');
var breakdance = new Breakdance()
  .use(plugin1())
  .use(plugin2())
var str = breakdance.render('<strong>Let\'s dance!');
//=> "**Let\'s dance**"

Params

  • options {Object|String}: Pass options if you need to instantiate Breakdance, or a string to convert HTML to markdown.

.use

Register a compiler plugin fn. Plugin functions should take an options object, and return a function that takes an instance of breakdance.

Params

  • fn {Function}: plugin function
  • returns {Object}: Returns the breakdance instance for chaining.

Example

// plugin example
function yourPlugin(options) {
  return function(breakdance) {
    // do stuff
  };
}
// usage
breakdance.use(yourPlugin());

.preprocess

Register a plugin to be called when .parse is run, after the AST is created by cheerio, but before the AST is converted to a breakdance AST. preprocess functions are passed an instance of breakdance and the cheerio instance that was created to parse the HTML.

Params

  • fn {Function}: Plugin function
  • returns {Object}: Returns the instance for chaining.

Example

var Breakdance = require('breakdance');
var breakdance = new Breakdance();

breakdance.preprocess(function($) { // do stuff with cheerio AST });

.define

Set a non-enumerable property or method on the breakdance instance. Useful in plugins for defining methods or properties for to be used inside compiler handler functions.

Params

  • name {String}: Name of the property or method being defined
  • val {any}: Property value
  • returns {Object}: Returns the instance for chaining.

Example

// plugin example
breakdance.use(function() {
  this.define('appendFoo', function(node) {
    node.val += 'Foo';
  });
});

// then, in a compiler "handler" function breakdance.set('text', function(node) { if (node.something === true) { this.appendFoo(node); } this.emit(node.val); });

.set

Register a handler function to be called on a node of the given type. Override a built-in handler type, or register a new type.

Params

  • type {String}: The node.type to call the handler on. You can override built-in handlers by registering a handler of the same name, or register a handler for rendering a new type.
  • fn {Function}: The handler function
  • returns {Object}: Returns the instance for chaining.

Example

breakdance.set('div', function(node) {
  // do stuff to node
});

.before

Register a handler that will be called by the compiler on every node of the given type, before other handlers are called on that node.

Params

  • type {String|Object|Array}: Handler name(s), or an object of handlers
  • fn {Function}: Handler function, if type is a string or array. Otherwise this argument is ignored.
  • returns {Object}: Returns the instance for chaining.

Example

breakdance.before('div', function(node) {
  // do stuff to node
});

// or breakdance.before(['div', 'span'], function(node) { // do stuff to node });

// or breakdance.before({ div: function(node) { // do stuff to node }, span: function(node) { // do stuff to node } });

.after

Register a handler that will be called by the compiler on every node of the given type, after other handlers are called on that node.

Params

  • type {String|Object|Array}: Handler name(s), or an object of handlers
  • fn {Function}: Handler function, if type is a string or array. Otherwise this argument is ignored.
  • returns {Object}: Returns the instance for chaining.

Example

breakdance.after('div', function(node) {
  // do stuff to node
});

// or breakdance.after(['div', 'span'], function(node) { // do stuff to node });

// or breakdance.after({ div: function(node) { // do stuff to node }, span: function(node) { // do stuff to node } });

.parse

Parses a string of html and returns an AST.

Params

  • html {String}: HTML string
  • options {Object}
  • returns {Object}: Abstract syntax tree

Example

var breakdance = new Breakdance();
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');

.compile

Convert the a breakdance AST from .parse to markdown with the specified options

Params

  • ast {String}
  • options {Object}
  • returns {Object}: Returns the AST and compiled markdown string on the .output property, in case you need the object for post-processing.

Example

var breakdance = new Breakdance();
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
var str = breakdance.compile(ast);
console.log(str);
//=> 'The Freaks Come Out at Night!'

.render

Converts a string of HTML to markdown with the specified options. Wraps the parse and compile to simplify converting HTML to markdown with a single function call.

Params

  • html {String}
  • options {Object}
  • returns {String}: Returns a markdown string.

Example

var breakdance = new Breakdance();
var str = breakdance.render('<strong>The Freaks Come Out at Night!</strong>');
console.log(str);
//=> 'The Freaks Come Out at Night!'

Core concepts

This document will help familiarize you with the breakdance API, as well as how the code works "under the hood", to equip you with the information you need to customize the generated output or author plugins.

Please let us know if you have any suggestions for improving the docs.

First things first

Although this document describes a few different core concepts, everything really centers around the breakdance AST. Before proceding, we recommend you take a moment to actually log out the AST to get a first-hand look at what the AST is, and how it works.

Add the following snippet of code to a local file, such as ast.js, then run $ node ast:

var Breakdance = require('breakdance');
var breakdance = new Breakdance(/* options */);
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
console.log(ast);

var str = breakdance.compile(ast);
console.log(str);
//=> '**The Freaks Come Out at Night!**'

Parser

The parser's job is the create the AST that will eventually be passed to the compiler.

Example

First, we start with the "root" AST object that will be used for storing nodes.

var ast = {
  type: 'root',
  nodes: []
};

Next, we need to create the parse function that is responsible for adding nodes to ast.nodes. Again, this is pseudo-code, but similar principles apply to breakdance.

var parsers = [
  function(str) {
    var match = /^[a-z]/.exec(str);
    if (match) {
      return {type: 'text', val: match[0]};
    }
  },
  function(str) {
    var match = /^\./.exec(str);
    if (match) {
      return {type: 'dot', val: match[0]};
    }
  },
  function(str) {
    var match = /^,/.exec(str);
    if (match) {
      return {type: 'comma', val: match[0]};
    }
  }
];

function parse(str) {
  var ast = {type: 'root', nodes: []};

  // add a "beginning-of-string" node
  ast.nodes.push({type: 'bos'});

  while (str.length) {
    // capture length of nodes before parsing
    var beforeLength = ast.nodes.length;

    for (var i = 0; i < parsers.length; i++) {
      var fn = parsers[i];
      var node = fn(str);
      if (node) {
        ast.nodes.push(node);
        // slice the matched valued off of the string
        str = str.slice(node.val.length);
        break;
      }
    }

    // if no new nodes were added to `ast.nodes`, we know
    // that none of the parsers found a match
    if (ast.nodes.length === beforeLength) {
      throw new Error('no parsers registered for ' + str);
    }
  }

  // add a "end-of-string" node
  ast.nodes.push({type: 'eos'});
  return ast;
}

var ast = parse('abc');
console.log(ast);
// {
//   type: 'root',
//   nodes: [
//     {
//       // "beginning-of-string"
//       type: 'bos',
//     },
//     {
//       type: 'text',
//       val: 'a'
//     },
//     {
//       type: 'text',
//       val: 'b'
//     },
//     {
//       type: 'text',
//       val: 'c'
//     },
//     {
//       // "end-of-string"
//       type: 'eos'
//     }
//   ]
// }

Compiler

The breakdance compiler is responsible for iterating over the AST and generating a new string based on the information contained within each node (or child object) of the AST.

  1. "visit" each node on the AST (this will be explained through the following examples)
  2. Look for a registered handler that matches the node.type
  3. Call the handler with the node

Which might look something like this (again, in pseudo-code):

var str = '';

var handlers = {
  bos: function(node) {
    str += '<';
  },
  comma: function(node) {
    str += '-'; // we can change it to whatever we want
  },
  dot: function(node) {
    str += '-'; // and again...
  },
  text: function(node) {
    str += node.val.toUpperCase();
  },
  eos: function(node) {
    str += '>';
  }
};

function compile(ast) {
  ast.nodes.forEach(function(node) {
    // get the handler for the node "type" and call it on the node
    // this is what "visit" means
    handlers[node.type](node);
  });
}

// continuing with the AST that was created in the "parser" example 
compile(ast);
console.log(str);
//=> '<A-B-C>'

In principle, this is how the breakdance compiler works, along with conveniences for adding handlers, and so on.

AST

The breakdance AST works the same way as in the earlier examples, with one addition: each node on the AST can have one of the following (never both):

  • nodes: an array of child nodes (just like the AST itself)
  • val: a string value

In fact, the AST itself is just another node. An AST with both types of nodes might look something like this:

// given the string "<strong>foo</strong>", breakdance's AST
// would look something like this:
var ast = {
  type: 'root',
  nodes: [
    {
      // "beginning-of-string"
      type: 'bos',
    },
    {
      // since <strong> elements have open and close tags,
      // the `strong` node will have a `nodes` array, for
      // storing child nodes
      type: 'strong',
      nodes: [
        {
          type: 'strong.open',
          val: ''
        },
        {
          // this could be a "text" node, or another type of tag
          // that has a `nodes` array, like `strong` itself
          type: 'text',
          val: 'foo'
        },
        {
          type: 'strong.close',
          val: ''
        }
      ]
    },
    {
      // "end-of-string"
      type: 'eos'
    }
  ]
};

All together

To see how all of these pieces fit together, we need to add one more thing.

In the compiler example, since none of the nodes in our example had a nodes array, let's review how that would work.

Visiting arrays of nodes

var str = '';

var handlers = {
  bos: function(node) {
    // do nothing
  },
  text: function(node) {
    str += node.val;
  },
  strong: function(node) {
    mapVisit(node.nodes);
  },
  'strong.open': function(node) {
    str += '**';
  },
  'strong.close': function(node) {
    str += '**';
  },
  eos: function(node) {
    // do nothing
  }
};

function visit(node) {
  if (node.nodes) {
    mapVisit(node.nodes);
  } else {
    handlers[node.type](node);
  }
}

function mapVisit(nodes) {
  nodes.forEach(function(node) {
    visit(node);
  });
}

function compile(ast) {
  visit(ast);
}

compile(ast);
console.log(str);
//=> '**foo**'

This concludes the overview of core concepts in breakdance. If you feel like something is missing (no matter how "obvious" or not), please let us know about it so we can improve this documentation for you and the next person. Thanks!

  • checklists: Get GitHub-style task list support with breakdance-checklist.
  • reflinks: Use breakdance-reflinks if you want to aggregate the urls from hrefs and src attributes at the bottom of the file as reference links

Next steps