Skip to content
View rodricios's full-sized avatar

Block or report rodricios

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. wxpath wxpath Public

    wxpath - declarative web crawling with XPath; a Web Query Language (WQL)

    Python 112 6

  2. eatiht eatiht Public

    An exercise in unsupervised machine learning: Extract Article's Text in HTml documents.

    HTML 430 42

  3. autocomplete autocomplete Public

    Autocomplete - light-weight, next-word prediction Python utility

    Python 450 73

  4. datalib/libextract datalib/libextract Public

    Extract data from websites using basic statistical magic

    Python 506 41

  5. Flipboard's summarization algorithm,... Flipboard's summarization algorithm, sort of
    1
    #!/usr/bin/env python
    2
    # -*- coding: utf-8 -*-
    3
    
                  
    4
    """
    5
  6. crawl-to-the-future crawl-to-the-future Public

    An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors

    HTML 35 3