API Documentation¶
Providers¶
-
class
micawber.providers.
Provider
(endpoint, **kwargs)¶ The
Provider
object is responsible for retrieving metadata about a given URL. It implements a method calledrequest()
, which takes a URL and any parameters, which it sends off to an endpoint. The endpoint should return a JSON dictionary containing metadata about the resource, which is returned to the caller.Parameters: - endpoint – the API endpoint which should return information about requested links
- kwargs – any additional url parameters to send to the endpoint on each request, used for providing defaults. An example use-case might be for providing an API key on each request.
-
request
(url, **extra_params)¶ Retrieve information about the given url. By default, will make a HTTP GET request to the endpoint. The url will be sent to the endpoint, along with any parameters specified in the
extra_params
and those parameters specified when the class was instantiated.Will raise a
ProviderException
in the event the URL is not accessible or the API times out.Parameters: - url – URL to retrieve metadata for
- extra_params – additional parameters to pass to the endpoint, for example a maxwidth or an API key.
Return type: a dictionary of JSON data
-
class
micawber.providers.
ProviderRegistry
([cache=None])¶ A registry for encapsulating a group of
Provider
instances, with optional caching support.Handles matching regular expressions to providers. URLs are sent to the registry via its
request()
method, it checks to see if it has a provider that matches the URL, and if so, requests the metadata from the provider instance.Exposes methods for parsing various types of text (including HTML), and either rendering oembed media inline or extracting embeddable links.
Parameters: cache – the cache simply needs to implement two methods, .get(key)
and.set(key, value)
.-
register
(regex, provider)¶ Register the provider with the following regex.
Example:
registry = ProviderRegistry() registry.register( 'http://\S*.youtu(\.be|be\.com)/watch\S*', Provider('http://www.youtube.com/oembed'), )
Parameters: - regex – a regex for matching URLs of a given type
- provider – a
Provider
instance
-
request
(url, **extra_params)¶ Retrieve information about the given url if it matches a regex in the instance’s registry. If no provider matches the URL, a
ProviderException
is thrown, otherwise the URL and parameters are dispatched to the matching provider’sProvider.request()
method.If a cache was specified, the resulting metadata will be cached.
Parameters: - url – URL to retrieve metadata for
- extra_params – additional parameters to pass to the endpoint, for example a maxwidth or an API key.
Return type: a dictionary of JSON data
-
parse_text_full
(text[, urlize_all=True[, handler=full_handler[, urlize_params=None[, **params]]]])¶ Parse a block of text, converting all links by passing them to the given handler. Links contained within a block of text (i.e. not on their own line) will be handled as well.
Example input and output:
IN: 'this is a pic http://example.com/some-pic/' OUT: 'this is a pic <a href="http://example.com/some-pic/"><img src="http://example.com/media/some-pic.jpg" /></a>'
Parameters: - text (str) – a string to parse
- urlize_all (bool) – convert unmatched urls into links
- handler – function to use to convert metadata back into a string representation
- urlize_params (dict) – keyword arguments to be used to construct a link when a provider is not found and urlize is enabled.
- params – any additional parameters to use when requesting metadata, i.e. a maxwidth or maxheight.
-
parse_text
(text[, urlize_all=True[, handler=full_handler[, block_handler=inline_handler[, urlize_params=None[, **params]]]]])¶ Very similar to
parse_text_full()
except URLs on their own line are rendered using the givenhandler
, whereas URLs within blocks of text are passed to theblock_handler
. The default behavior renders full content for URLs on their own line (e.g. a video player), whereas URLs within text are rendered simply as links so as not to disrupt the flow of text.- URLs on their own line are converted into full representations
- URLs within blocks of text are converted into clickable links
Parameters: - text (str) – a string to parse
- urlize_all (bool) – convert unmatched urls into links
- handler – function to use to convert links found on their own line
- block_handler – function to use to convert links found within blocks of text
- urlize_params (dict) – keyword arguments to be used to construct a link when a provider is not found and urlize is enabled.
- params – any additional parameters to use when requesting metadata, i.e. a maxwidth or maxheight.
-
parse_html
(html[, urlize_all=True[, handler=full_handler[, block_handler=inline_handler[, urlize_params=None[, **params]]]]])¶ Parse HTML intelligently, rendering items on their own within block elements as full content (e.g. a video player), whereas URLs within text are passed to the
block_handler
which by default will render a simple link. URLs that are already enclosed within a<a>
tag are skipped over.- URLs that are already within <a> tags are passed over
- URLs on their own in block tags are converted into full representations
- URLs interspersed with text are converted into clickable links
Note
requires BeautifulSoup or beautifulsoup4
Parameters: - html (str) – a string of HTML to parse
- urlize_all (bool) – convert unmatched urls into links
- handler – function to use to convert links found on their own within a block element
- block_handler – function to use to convert links found within blocks of text
- urlize_params (dict) – keyword arguments to be used to construct a link when a provider is not found and urlize is enabled.
- params – any additional parameters to use when requesting metadata, i.e. a maxwidth or maxheight.
-
extract
(text, **params)¶ Extract all URLs from a block of text, and additionally get any metadata for URLs we have providers for.
Parameters: - text (str) – a string to parse
- params – any additional parameters to use when requesting metadata, i.e. a maxwidth or maxheight.
Return type: returns a 2-tuple containing a list of all URLs and a dict keyed by URL containing any metadata. If a provider was not found for a URL it is not listed in the dictionary.
-
extract_html
(html, **params)¶ Extract all URLs from an HTML string, and additionally get any metadata for URLs we have providers for.
extract()
but for HTML.Note
URLs within <a> tags will not be included.
Parameters: - html (str) – a string to parse
- params – any additional parameters to use when requesting metadata, i.e. a maxwidth or maxheight.
Return type: returns a 2-tuple containing a list of all URLs and a dict keyed by URL containing any metadata. If a provider was not found for a URL it is not listed in the dictionary.
-
-
micawber.providers.
bootstrap_basic
([cache=None[, registry=None]])¶ Create a
ProviderRegistry
and register some basic providers, including youtube, flickr, vimeo.Parameters: - cache – an object that implements simple
get
andset
- registry – a
ProviderRegistry
instance, which will be updated with the list of supported providers. If not specified, an emptyProviderRegistry
will be used.
Return type: a
ProviderRegistry
with a handful of providers registered- cache – an object that implements simple
-
micawber.providers.
bootstrap_oembed
([cache=None[, registry=None[, refresh=False[, **kwargs]]])¶ Create a
ProviderRegistry
and register as many providers as are described in the oembed.com providers list.Note
This function makes a request over the internet whenever it is called.
Parameters: - cache – an object that implements simple
get
andset
- registry – a
ProviderRegistry
instance, which will be updated with the list of supported providers. If not specified, an emptyProviderRegistry
will be used. - refresh (bool) – force refreshing the provider data rather than attempting to load it from cache (if cache is used).
- kwargs – any default keyword arguments to use with providers
Return type: a ProviderRegistry with support for noembed
- cache – an object that implements simple
-
micawber.providers.
bootstrap_embedly
([cache=None[, registry=None[, refresh=False[, **kwargs]]])¶ Create a
ProviderRegistry
and register as many providers as are supported by embed.ly. Valid services are fetched from http://api.embed.ly/1/services/python and parsed then registered.Note
This function makes a request over the internet whenever it is called.
Parameters: - cache – an object that implements simple
get
andset
- registry – a
ProviderRegistry
instance, which will be updated with the list of supported providers. If not specified, an emptyProviderRegistry
will be used. - refresh (bool) – force refreshing the provider data rather than attempting to load it from cache (if cache is used).
- kwargs – any default keyword arguments to use with providers, useful for specifying your API key
Return type: a ProviderRegistry with support for embed.ly
# if you have an API key, you can specify that here pr = bootstrap_embedly(key='my-embedly-key') pr.request('http://www.youtube.com/watch?v=54XHDUOHuzU')
- cache – an object that implements simple
-
micawber.providers.
bootstrap_noembed
([cache=None[, registry=None[, refresh=False[, **kwargs]]])¶ Create a
ProviderRegistry
and register as many providers as are supported by noembed.com. Valid services are fetched from http://noembed.com/providers and parsed then registered.Note
This function makes a request over the internet whenever it is called.
Parameters: - cache – an object that implements simple
get
andset
- registry – a
ProviderRegistry
instance, which will be updated with the list of supported providers. If not specified, an emptyProviderRegistry
will be used. - refresh (bool) – force refreshing the provider data rather than attempting to load it from cache (if cache is used).
- kwargs – any default keyword arguments to use with providers, useful for
passing the
nowrap
option to noembed.
Return type: a ProviderRegistry with support for noembed
# if you have an API key, you can specify that here pr = bootstrap_noembed(nowrap=1) pr.request('http://www.youtube.com/watch?v=54XHDUOHuzU')
- cache – an object that implements simple
Cache¶
-
class
micawber.cache.
Cache
¶ A reference implementation for the cache interface used by the
ProviderRegistry
.from micawber import Cache, bootstrap_oembed cache = Cache() # Simple in-memory cache. # Now our oembed provider will cache the responses for each URL we # request, which can provide a significant speedup. pr = bootstrap_oembed(cache=cache)
-
get
(key)¶ Retrieve the key from the cache or
None
if not present
-
set
(key, value)¶ Set the cache key
key
to the givenvalue
.
-
-
class
micawber.cache.
PickleCache
([filename='cache.db'])¶ A cache that uses pickle to store data.
Note
To use this cache class be sure to call
load()
when initializing your cache andsave()
before your app terminates to persist cached data.-
load
()¶ Load the pickled data into memory
-
save
()¶ Store the internal cache to an external file
-
-
class
micawber.cache.
RedisCache
([namespace='micawber'[, timeout=None[, **conn]]])¶ A cache that uses Redis to store data
Note
requires the redis-py library,
pip install redis
Parameters: - namespace – prefix for cache keys
- timeout (int) – expiration timeout in seconds (optional)
- conn – keyword arguments to pass when initializing redis connection