Welcome!
Thank you for visiting the breakdance docs! Please let us know if you find any typos, mispellings or similar grammar mistakes as you are reading.
Pull requests are also appreciated.
Getting started
If you want to see how breakdance works, the CLI is the fastest way to get started.
Installing the CLI
Install the breakdance CLI with npm:
$ npm install --global breakdance-cli
This adds the bd
command to your system path, allowing you to run the breakdance CLI from any directory.
$ bd
# also aliased as "breakdance" in case of conflicts
$ breakdance
Using the CLI
If you want to do a quick test-drive, add the following content to foo.html
:
<h2 id=tables-hover-rows>Hover rows</h2>
<p>Add <code>.table-hover</code> to enable a hover state on table rows within a <code><tbody></code>.</p>
<div class=bs-example data-example-id=hoverable-table>
<table class="table table-hover">
<thead>
<tr>
<th>#</th>
<th>First Name</th>
<th>Last Name</th>
<th>Username</th>
</tr>
</thead>
<tbody>
<tr>
<th scope=row>1</th>
<td>Mark</td>
<td>Otto</td>
<td>@mdo</td>
</tr>
<tr>
<th scope=row>2</th>
<td>Jacob</td>
<td>Thornton</td>
<td>@fat</td>
</tr>
<tr>
<th scope=row>3</th>
<td>Larry</td>
<td>the Bird</td>
<td>@twitter</td>
</tr>
</tbody>
</table>
</div>
Run breakdance
Next, from the command line run:
$ bd foo.html
If everything is installed properly, you should now see a foo.md
file in the current working directory with something like following:
## Hover rows
Add `.table-hover` to enable a hover state on table rows within a `<tbody>`.
| # | First Name | Last Name | Username |
| --- | --- | --- | --- |
| 1 | Mark | Otto | @mdo |
| 2 | Jacob | Thornton | @fat |
| 3 | Larry | the Bird | @twitter |
CLI options
Most of the breakdance options can be set from the command line. Play around with the options on the following help menu to customize output.
Help menu
Usage: $ breakdance [options] <file> <dest>
file: The HTML file to convert to markdown
dest: Name of the markdown file to create. By default
the HTML filename is used with a .md extension
Options:
-c, --condense Collapse more than two newlines to only
two newlines. Enabled by default
-d, --domain Specify the root domain name to prefix onto
"href" or "src" paths that do not star
-o, --omit One or more tags to omit entirely from
the HTML before converting to markdown.
-p, --pick One or more tags to pick entirely from the
HTML before converting to markdown.
--comments Include HTML code comments in the generated
markdown string. Disabled by default
Example
Tell breakdance to only render specific HTML elements by passing CSS selectors to the --pick
flag (powered by
cheerio).
For example, to only generate the <table>
from the
HTML in the previous example, you would run:
$ bd foo.html --pick table
Try it! It's addictive!
API Usage
Breakdance is a JavaScript-based node.js module that,
once installed, can be added to any .js
file using
node's require()
system by adding the following line of code:
var breakdance = require('breakdance');
That line of code exposes the "main export" of the breakdance library, which is a function that can used in one of the following ways:
- called with a string to convert HTML to markdown directly, or
- treated as a constructor function and instantiated if you need to register plugins first
Example 1: Direct usage
var breakdance = require('breakdance');
console.log(breakdance('<strong>The freaks come out at night!</strong>'));
//=> '**The freaks come out at night!**'
Example 2: Constructor function
This is useful when you need to register plugins first, or you want to modify the parser or compiler before rendering.
var Breakdance = require('breakdance');
var breakdance = new Breakdance()
.use(foo())
.use(bar())
.use(baz());
console.log(breakdance.render('<strong>The freaks come out at night!</strong>'));
//=> '**The freaks come out at night!**'
Breakdance AST
In addition to registering plugins, when instantiating you can also directly access the breakdance parser and compiler.
Logging out the AST is a great way to learn how breakdance works, and will be helpful in understanding how to customize behavior if you want to write plugins.
var Breakdance = require('breakdance');
var breakdance = new Breakdance(/* options */);
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
console.log(ast);
var str = breakdance.compile(ast);
console.log(str);
//=> '**The Freaks Come Out at Night!**'
Setting options
A number of different options are available for configuring breakdance. If you need more than what the options provide, see the customizing breakdance section.
To set options, pass an object as the last argument to breakdance
:
breakdance(string[, options]);
Example
var breakdance = require('breakdance');
var options = {domain: 'https://github.com'};
console.log(breakdance('<a href="/some-link"></a>', options));
//=> '[]
Options
comments
Type: boolean
Default: undefined
Include HTML code comments in the generated markdown string. Disabled by default.
console.log(breakdance('<strong>Foo</strong> <!-- bar -->', {comments: true}));
//=> '**Foo** <!-- bar -->'
condense
Type: boolean
Default: true
Collapse more than two newlines to only two newlines. Enabled by default.
domain
Type: string
Default: undefined
Specify the root domain name to prefix onto href
or src
paths that do not start with #
or contain ://
.
keepEmpty
Type: string|array
Default: undefined
Selective keep tags that are omitted by omitEmpty, so you don't need to redefine all of the omitted tags.
knownOnly
Type: boolean
Default: undefined
When true
, breakdance will throw an error if any non-standard/custom HTML tags are encountered. If you find a tag that breakdance doesn't cover, but you think it should, please
create an issue to let us know about it.
See the breakdance recipes for an example of how to add support for custom HTML elements.
leadingNewline
Type: boolean
Default: undefined
Add a newline at the beggining of the generated markdown string.
omit
Type: array|string
Default: One or more tags to omit entirely from the HTML before converting to markdown.
Example
Given the following HTML:
<em>Foo</em> <strong class="xyz">Bar</strong> <em class="xyz">Baz</em>
You can do the following to selectively omit tags:
console.log(breakdance(html))
//=> '_Foo_ **Bar** _Baz_'
console.log(breakdance(html, {omit: 'em'}))
//=> '**Bar**'
console.log(breakdance(html, {omit: 'em[class="xyz"]'}))
//=> '_Foo_ **Bar**'
console.log(breakdance(html, {omit: '[class="xyz"]'}))
//=> '_Foo_'
console.log(breakdance(html, {omit: 'em,strong'}))
//=> ''
console.log(breakdance(html, {omit: ['em', 'strong']}))
//=> ''
pick
Type: array|string
Default: One or more tags to pick entirely from the HTML before converting to markdown.
Example
Given the following HTML:
<em>Foo</em> <strong class="xyz">Bar</strong> <em class="xyz">Baz</em>
You can do the following to selectively pick tags:
console.log(breakdance(html, {pick: 'em'}))
//=> '_Foo_ _Baz_'
console.log(breakdance(html, {pick: 'em[class="xyz"]'}))
//=> '_Baz_'
console.log(breakdance(html, {pick: '[class="xyz"]'}))
//=> '**Bar** _Baz_'
console.log(breakdance(html, {pick: 'em,strong'}))
//=> '_Foo_ **Bar** _Baz_'
console.log(breakdance(html, {pick: ['em', 'strong']}))
//=> '_Foo_ **Bar** _Baz_'
omitEmpty
Type: array
Default: b
, del
, div
, em
, i
, li
, ol
, s
, span
, strong
, section
, u
, ul
.
Array of tags to strip from the HTML when they contain only whitespace or nothing at all. Note then this option is defined, the parser must recurse over every child node to determine if the tag is empty, which can make parsing much slower (and is the reason an array of tag names is required, versus checking all nodes).
console.log(breakdance('...', {omitEmpty: ['div', 'span']}));
one
Type: boolean
Default: undefined
Use 1.
for all ordered list items. Most markdown renderers automatically renumber lists. Using 1.
for all items is an easy way of not having to renumber list items when one is added or removed.
Examples
Given the following HTML ordered list:
<ol>
<li>Foo</li>
<li>Bar</li>
<li>Baz</li>
</ol>
When rendered:
console.log(breakdance(list));
Results in:
1. Foo
2. Bar
3. Baz
And if options.one
is true:
console.log(breakdance(list, {one: true}));
The result is:
1. Foo
1. Bar
1. Baz
unsmarty
Type: boolean
Default: undefined
Convert smart quotes to regular quotes.
snapdragon
Type: object
Default: undefined
Pass your own instance of snapdragon. We're using snapdragon-cheerio to modify the cheerio AST to be compatible with Snapdragon, which consumes the AST and renders to markdown.
slugify
Type: function
Default: undefined
Pass a custom function for slugifying the url in anchors. By default, anchor urls are formatted to be consistent with how GitHub slugifies anchors.
title
Type: boolean
Default: undefined
Output the text in the <title>
tag as an h1
at the start of the generated markdown document.
Example
console.log(breakdance('<title>Foo</title>', {title: true}));
//=> '# Foo\n'
trailingNewline
Type: boolean
Default: true
Add a newline at the end of the generated markdown string.
trailingWhitespace
Type: boolean
Default: true
Trim trailing non-newline whitespace from the string.
trim
Type: boolean
Default: true
Trim leading and trailing whitespace from the string.
url
Type: function
Default: undefined
Pass a function to use on all URLs that are process for the href
attribute in <a>
tags and the src
attribute in <img>
tags.
var markdown = breakdance(html, {
url: function(str) {
// do stuff to URL string
return str;
}
});
console.log(markdown);
whitespace
Type: boolean
Default: true
Normalize whitespace, courtesy of the [breakdance-whitespace][] plugin. If you don't like the default normalization, you can disable or override this via options, or write a custom plugin.
Disable whitespace handling
breakdance('<title>Foo</title>', {whitespace: false});
handlers
Type: object
Default: undefined
Pass a function with the name of the node type to options.handlers
, to override the built-in handler for any node.type
.
Examples
Customize whitespace handling by overriding the built-in text
handler:
breakdance('<title>Foo</title>', {
handlers: {
text: function(node) {
// do stuff to text node
}
}
});
Customize output of the <strong>
handler:
breakdance('Foo <strong>Bar</strong>', {
handlers: {
strong: function(node) {
this.emit(node.val.toUpperCase());
}
}
});
//=> 'Foo **BAR**'
Disable the strong
handler (note that breakdance would throw an error if you encounter a <strong>
tag and no handlers are registered for it):
breakdance('Foo <strong>Bar</strong>', {handlers: {strong: false}});
//=> 'Foo '
If you need to do something more advanced, or you want breakdance to always use your customizations, you can also write a plugin for this.
after
TODO
var md = breakdance(html, {
after: {
eos: function(node) {
// do stuff after end-of-string
}
}
});
before
TODO
var md = breakdance(html, {
before: {
div: function(node) {
if (node.isGrid) {
this.emit(node.html, node);
node.nodes = []
}
}
}
});
preprocess
TODO
var md = breakdance(html, {
preprocess: function($, node) {
}
});
postprocess
TODO
Customizing
If you don't like the defaults or need more than what the options provide, that's okay, breakdance was made to be easy to customize:
- Jump to the API section to learn how to hack on breakdance.
- visit the plugin docs to learn how to find or write plugins.
API
This section describes the API methods exposed by breakdance. For you to get the most of out the documentation in this section, it might help to take a moment to learn about the core concepts around which these API methods are designed.
Hacking on breakdance
Override any defaults, or add support for custom elements, attributes or options. For example, you can control how any HTML element is converted to markdown, or even how a certain element with specific attributes is converted. You can override built-in renderers, create custom renderers for custom HTML tags, or create plugins that "bundle" together your commonly used customizations or preferences.
Breakdance
Create an instance of Breakdance
with the given options
.
Use this format to convert HTML to markdown.
var breakdance = require('breakdance');
var str = breakdance('<strong>Let\'s dance!');
//=> "**Let\'s dance**"
Or use this format if you need to use plugins
var Breakdance = require('breakdance');
var breakdance = new Breakdance()
.use(plugin1())
.use(plugin2())
var str = breakdance.render('<strong>Let\'s dance!');
//=> "**Let\'s dance**"
Params
options
{Object|String}: Pass options if you need to instantiate Breakdance, or a string to convert HTML to markdown.
.use
Register a compiler plugin fn
. Plugin functions should take an options object, and return a function that takes an instance of breakdance.
Params
fn
{Function}: plugin functionreturns
{Object}: Returns the breakdance instance for chaining.
Example
// plugin example
function yourPlugin(options) {
return function(breakdance) {
// do stuff
};
}
// usage
breakdance.use(yourPlugin());
.preprocess
Register a plugin to be called when .parse
is run, after the AST is created by
cheerio, but before the AST is converted to a breakdance AST. preprocess
functions are passed an instance of breakdance and the cheerio instance that was created to parse the HTML.
Params
fn
{Function}: Plugin functionreturns
{Object}: Returns the instance for chaining.
Example
var Breakdance = require('breakdance');
var breakdance = new Breakdance();
breakdance.preprocess(function($) {
// do stuff with cheerio AST
});
.define
Set a non-enumerable property or method on the breakdance instance. Useful in plugins for defining methods or properties for to be used inside compiler handler functions.
Params
name
{String}: Name of the property or method being definedval
{any}: Property valuereturns
{Object}: Returns the instance for chaining.
Example
// plugin example
breakdance.use(function() {
this.define('appendFoo', function(node) {
node.val += 'Foo';
});
});
// then, in a compiler "handler" function
breakdance.set('text', function(node) {
if (node.something === true) {
this.appendFoo(node);
}
this.emit(node.val);
});
.set
Register a handler function to be called on a node of the given type
. Override a built-in handler type
, or register a new type.
Params
type
{String}: Thenode.type
to call the handler on. You can override built-in handlers by registering a handler of the same name, or register a handler for rendering a new type.fn
{Function}: The handler functionreturns
{Object}: Returns the instance for chaining.
Example
breakdance.set('div', function(node) {
// do stuff to node
});
.before
Register a handler that will be called by the compiler on every node of the given type
,
before other handlers are called on that node.
Params
type
{String|Object|Array}: Handler name(s), or an object of handlersfn
{Function}: Handler function, iftype
is a string or array. Otherwise this argument is ignored.returns
{Object}: Returns the instance for chaining.
Example
breakdance.before('div', function(node) {
// do stuff to node
});
// or
breakdance.before(['div', 'span'], function(node) {
// do stuff to node
});
// or
breakdance.before({
div: function(node) {
// do stuff to node
},
span: function(node) {
// do stuff to node
}
});
.after
Register a handler that will be called by the compiler on every node of the given type
,
after other handlers are called on that node.
Params
type
{String|Object|Array}: Handler name(s), or an object of handlersfn
{Function}: Handler function, iftype
is a string or array. Otherwise this argument is ignored.returns
{Object}: Returns the instance for chaining.
Example
breakdance.after('div', function(node) {
// do stuff to node
});
// or
breakdance.after(['div', 'span'], function(node) {
// do stuff to node
});
// or
breakdance.after({
div: function(node) {
// do stuff to node
},
span: function(node) {
// do stuff to node
}
});
.parse
Parses a string of html
and returns an AST.
Params
html
{String}: HTML stringoptions
{Object}returns
{Object}: Abstract syntax tree
Example
var breakdance = new Breakdance();
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
.compile
Convert the a breakdance AST from
.parse to markdown with the specified options
Params
ast
{String}options
{Object}returns
{Object}: Returns the AST and compiled markdown string on the.output
property, in case you need the object for post-processing.
Example
var breakdance = new Breakdance();
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
var str = breakdance.compile(ast);
console.log(str);
//=> 'The Freaks Come Out at Night!'
.render
Converts a string of HTML to markdown with the specified options
. Wraps the
parse and
compile to simplify converting HTML to markdown with a single function call.
Params
html
{String}options
{Object}returns
{String}: Returns a markdown string.
Example
var breakdance = new Breakdance();
var str = breakdance.render('<strong>The Freaks Come Out at Night!</strong>');
console.log(str);
//=> 'The Freaks Come Out at Night!'
Core concepts
This document will help familiarize you with the breakdance API, as well as how the code works "under the hood", to equip you with the information you need to customize the generated output or author plugins.
Please let us know if you have any suggestions for improving the docs.
First things first
Although this document describes a few different core concepts, everything really centers around the breakdance AST. Before proceding, we recommend you take a moment to actually log out the AST to get a first-hand look at what the AST is, and how it works.
Add the following snippet of code to a local file, such as ast.js
, then run $ node ast
:
var Breakdance = require('breakdance');
var breakdance = new Breakdance(/* options */);
var ast = breakdance.parse('<strong>The Freaks Come Out at Night!</strong>');
console.log(ast);
var str = breakdance.compile(ast);
console.log(str);
//=> '**The Freaks Come Out at Night!**'
Parser
The parser's job is the create the AST that will eventually be passed to the compiler.
Example
First, we start with the "root" AST object that will be used for storing nodes.
var ast = {
type: 'root',
nodes: []
};
Next, we need to create the parse
function that is responsible for adding nodes to ast.nodes
. Again, this is pseudo-code, but similar principles apply to breakdance.
var parsers = [
function(str) {
var match = /^[a-z]/.exec(str);
if (match) {
return {type: 'text', val: match[0]};
}
},
function(str) {
var match = /^\./.exec(str);
if (match) {
return {type: 'dot', val: match[0]};
}
},
function(str) {
var match = /^,/.exec(str);
if (match) {
return {type: 'comma', val: match[0]};
}
}
];
function parse(str) {
var ast = {type: 'root', nodes: []};
// add a "beginning-of-string" node
ast.nodes.push({type: 'bos'});
while (str.length) {
// capture length of nodes before parsing
var beforeLength = ast.nodes.length;
for (var i = 0; i < parsers.length; i++) {
var fn = parsers[i];
var node = fn(str);
if (node) {
ast.nodes.push(node);
// slice the matched valued off of the string
str = str.slice(node.val.length);
break;
}
}
// if no new nodes were added to `ast.nodes`, we know
// that none of the parsers found a match
if (ast.nodes.length === beforeLength) {
throw new Error('no parsers registered for ' + str);
}
}
// add a "end-of-string" node
ast.nodes.push({type: 'eos'});
return ast;
}
var ast = parse('abc');
console.log(ast);
// {
// type: 'root',
// nodes: [
// {
// // "beginning-of-string"
// type: 'bos',
// },
// {
// type: 'text',
// val: 'a'
// },
// {
// type: 'text',
// val: 'b'
// },
// {
// type: 'text',
// val: 'c'
// },
// {
// // "end-of-string"
// type: 'eos'
// }
// ]
// }
Compiler
The breakdance compiler is responsible for iterating over the AST and generating a new string based on the information contained within each node (or child object) of the AST.
- "visit" each node on the AST (this will be explained through the following examples)
- Look for a registered handler that matches the
node.type
- Call the handler with the
node
Which might look something like this (again, in pseudo-code):
var str = '';
var handlers = {
bos: function(node) {
str += '<';
},
comma: function(node) {
str += '-'; // we can change it to whatever we want
},
dot: function(node) {
str += '-'; // and again...
},
text: function(node) {
str += node.val.toUpperCase();
},
eos: function(node) {
str += '>';
}
};
function compile(ast) {
ast.nodes.forEach(function(node) {
// get the handler for the node "type" and call it on the node
// this is what "visit" means
handlers[node.type](node);
});
}
// continuing with the AST that was created in the "parser" example
compile(ast);
console.log(str);
//=> '<A-B-C>'
In principle, this is how the breakdance compiler works, along with conveniences for adding handlers, and so on.
AST
The breakdance AST works the same way as in the earlier examples, with one addition: each node on the AST can have one of the following (never both):
nodes
: an array of child nodes (just like the AST itself)val
: a string value
In fact, the AST itself is just another node. An AST with both types of nodes might look something like this:
// given the string "<strong>foo</strong>", breakdance's AST
// would look something like this:
var ast = {
type: 'root',
nodes: [
{
// "beginning-of-string"
type: 'bos',
},
{
// since <strong> elements have open and close tags,
// the `strong` node will have a `nodes` array, for
// storing child nodes
type: 'strong',
nodes: [
{
type: 'strong.open',
val: ''
},
{
// this could be a "text" node, or another type of tag
// that has a `nodes` array, like `strong` itself
type: 'text',
val: 'foo'
},
{
type: 'strong.close',
val: ''
}
]
},
{
// "end-of-string"
type: 'eos'
}
]
};
All together
To see how all of these pieces fit together, we need to add one more thing.
In the compiler example, since none of the nodes in our example had a nodes
array, let's review how that would work.
Visiting arrays of nodes
var str = '';
var handlers = {
bos: function(node) {
// do nothing
},
text: function(node) {
str += node.val;
},
strong: function(node) {
mapVisit(node.nodes);
},
'strong.open': function(node) {
str += '**';
},
'strong.close': function(node) {
str += '**';
},
eos: function(node) {
// do nothing
}
};
function visit(node) {
if (node.nodes) {
mapVisit(node.nodes);
} else {
handlers[node.type](node);
}
}
function mapVisit(nodes) {
nodes.forEach(function(node) {
visit(node);
});
}
function compile(ast) {
visit(ast);
}
compile(ast);
console.log(str);
//=> '**foo**'
This concludes the overview of core concepts in breakdance. If you feel like something is missing (no matter how "obvious" or not), please let us know about it so we can improve this documentation for you and the next person. Thanks!
Related
- checklists: Get GitHub-style task list support with breakdance-checklist.
- reflinks: Use breakdance-reflinks if you want to aggregate the urls from hrefs and src attributes at the bottom of the file as reference links
Next steps
- See HTML-to-markdown conversion examples
- Learn how to find or author plugins
- Visit the breakdance issue tracker to report bugs and documentation errors or make feature requests
- Contribute to breakdance