back to article Here's how we made a no-fuss RSS vulture app using trendy Electron

Keeping up with the endless torrent of stuff happening online is a losing battle. In the absence of omniscience, there's just no way to catch every bug, cockup, blowup, and scandal as it breaks. The next best thing is RSS, the aging syndication protocol that provides a convenient mechanism for aggregating online content. In …

  1. GlenP Silver badge

    First, the app has to be oriented toward the efficient display of text, to maximize the number of headlines visible. It should be a headline scanning app, not a news reading app. It should not display images at all, because they take up screen space and slow load times.

    Second, the app should sort stories chronologically. That's something more RSS apps should do.

    Third, the application should filter by time. It should contain links to new pages and not old tired stuff. That means it shouldn't retain material, which most RSS readers seem to do. There should be no concept of "read" or "unread" articles published within a specified time window. I don't want to open it up after a week and find thousands of unread pages.

    Could someone point out these objectives to the BBC, Google, and just about every other news site on the web?

  2. 45RPM

    I am disappoint

    I read the headline and immediately thought that this article might be about the reduced specification Acorn BBC computer. Surely it must be ripe for a hipster revival about now.

    1. Anonymous Coward
      Anonymous Coward

      Re: Acorn ... Surely it must be ripe for a hipster revival about now.

      wait a sec ... just let me quickly set up a twitstarter page for one....

    2. Numpty
      Thumb Up

      Re: I am disappoint

      Here you go:

      https://www.engadget.com/2018/02/23/acorn-brand-revived-to-sell-phones/

  3. K
    Pint

    Nice article..

    Some more of these please el-reg.. cross disciplines is fine (networks, security, coding etc)..

    Reminds me why I started reading, almost 20 years ago (I still periodically read back through the BoFH archive)

  4. Anonymous Coward
    Anonymous Coward

    "Neither is completely reliable because reliable webpage change detection is difficult."

    I have an application that does a batch trawl of over a thousand web pages once a week. It analyses custom web pages, Facebook, and YouTube - but also extracts video information from Vimeo etc. It has taken several years to develop a robust set of code based on Excel, VBA, and Selenium with Chrome.

    A page is split into sections by its own tuned parameter file. Originally that was automatic based on simple DIV etc tags. As page structures have become more complicated - a manual analysis is needed to construct the parameters for particular tags and attributes that signify section breaks in the information.

    Almost every week there is a change in someone's underlying structure of their HTML - which may have little, or no effect, on the displayed data. Some of the more annoying changes are down to embedded links that produce a different parameter string each time that page is loaded. These can usually be neutralised automatically.

    It is a time consuming whack-a-mole situation.

    Some pages are impossible to analyse as their internal structure appears to be machine generated with no consistent ID or CLASS attributes. It could be called Joycean.

    As many people have found - the first thing is to be selective about which parts of a page's HTML can be ignored. That gets rid of a lot of the variability that occurs in headers and footers and side panels. Running uBlock Origin and Ghostery also helps.

    This process ends up with a series of records that contain blocks of text/images/links - at best each block represents one complete item.

    These blocks are then filtered by checking entries in a "history" file for all the owner's pages. If an item does not appear on a page for x weeks then its history entry is deleted. This prevents history files from continually expanding - and allows for seasonal reappearances that may be useful. Video links are always remembered as they form a catalogue - with duplications caused by the same video having different identifications on various video hosting sites.

    A common HTML filtering parameter is of the form

    (FilterOnTag=TAG=ATTRIBUTE=LIKEVALUE=SPECIALFIELDMARKER

    =OUTPUTCONTROL=HOWTOSPLIT)

    eg

    (FilterOnTag=div=id=article*= =OutputAllowed=WholeBlock)

    (FilterOnTag=h2=No_Attribute=No_Value= =OutputAllowed=BlockSeparation)

    (FilterOnTag=div=class=footer= =OutputNotAllowed= )

    (FilterOnTag=a=id=owner*=postowner=OutputAllowed= )

    1. adnim

      When Scroogle closed down. I wrote my own Google scraper to replace it. It worked well. No adverts, no compare the xyz site, no shopping results. Unless I allowed them. Unfortunately Google change the layout, the element ID's and classes of the result pages with annoying regularity. I gave up on trying to keep up after a couple of months. On the odd occasion I use Google now I have settled on the element hiding rules of adblock origin.

      1. Aitor 1 Silver badge

        Scraping

        There are even websites that use temolating to change the structure every week.. just to prevent scraping, or make it more costly.

  5. bencurthoys

    Have you tried Inoreader?

    I think it ticks all your boxes...

    Screenshot of my feeds in title only mode sorted by time: https://snag.gy/4dghw1.jpg

    It does track an "unread" count, but you can just ignore that.

    1. K

      Re: Have you tried Inoreader?

      Yeah InoReader is frickin awesome...

      I use it as a feed manager, then use News+ as the RSS reader (with the Inoreader plugin), its comprehensive, and the developers are not constantly adding new "features" (they keep the UI up to date), so its stayed stable and constant.

      A mute point.. they should have renamed it InoFeeder.. as thats what I use it for!

      1. Adair Silver badge

        Pedant alert (but this time it matters)!

        @K

        The word you want is 'moot' (open to debate), not 'mute' (unable to speak).

        May none of us ever stop learning. :-)

        1. Chris Watson 2

          Re: Pedant alert (but this time it matters)!

          I believe the correct word is 'moo' (having little or no practical relevance), as, like a cow's opinion, it does not matter.

          1. John Gamble

            Re: Pedant alert (but this time it matters)!

            "I believe the correct word is 'moo' (having little or no practical relevance), as, like a cow's opinion, it does not matter."

            That would be 'mu', although by now we've gone well over the homophone cliff.

          2. desht

            Re: Pedant alert (but this time it matters)!

            Ah, you've been watching Friends re-runs too?

        2. Chemical Bob
          Headmaster

          Re: 'moot', not 'mute'

          Maybe the OP didn't want to waste any more time talking about it...

        3. Jeffrey Nonken

          Re: Pedant alert (but this time it matters)!

          I usually resist pointing out grammar, spelling, punctuation and homophone errors. Doesn't keep me from pulling my few remaining hairs out, but it saves me endless fights and hours of time each day.

  6. Phil Koenig

    A headline lister?

    I've been using RSS for quite a few years and a "headline lister" sounds fairly pointless to me.

    The whole reason I use RSS readers is to avoid all the garbage on the original webpages, and to reformat the pages into something that doesn't blind me. (I pretty much despise blinding white backgrounds on anything I have to read much of.)

    I realize this may sound like some kind of declaration of war to those whose salaries depend on website advertising, but if I wanted to load all the scripts, images, tracking nonsense, ads and other junk just to read a couple of paragraphs for each article of interest I would just go to the original website and forget about RSS.

    1. Richard Parkin

      Re: A headline lister?

      “I've been using RSS for quite a few years and a "headline lister" sounds fairly pointless to me.” Me too. The headlines are a poor guide to what’s in the article, for example ... that site called The Register :-( I want a few lines of text at least and then if it is interesting, add it to my reading list in Safari. I think most RSS readers are configurable - I use Newsify and can definitely set read items to disappear.

  7. AMBxx Silver badge
    Windows

    YAFF?

    Yet another Framework?

    Pretty soon there'll be more frameworks than developers.

  8. tiggity Silver badge

    XSLT

    As RSS is just XML data, and XSLT is designed to transform XML (and as cross platform as you could wish), and its quite trivial to filter by date, only do headlines etc.

    Back in the day there were many people writing about this as an easy alternative (and very flexible) to RSS aggregators / readers that did not quite do what you wanted.

    Still there's multiple ways to skin a cat and its all a good learning experience (& maybe even enjoyable) writing your own code instead of using an off the shelf tool.

    But I would recommend, when dealing with XML, then XML processing languages / tools are an intuitive / sensible starting point

  9. Barry Rueger

    TinyTinyRSS

    Been using it for years. Easy to set up (some hosting providers even include auto-installers) and thus far has never stumbled.

    https://tt-rss.org

  10. thames
    Linux

    Liferea

    Liferea seems to be pretty much the standard RSS feed reader on Linux. It's fast, configurable, and easy to use. It can also run external scripts to massage malformed feeds or even scrape web pages.

    I'm not going to criticise Vulture-feeds - hat's off to you for actually building something that suites your needs rather than just regurgitating press releases. It certainly gives you insight into Electron that you wouldn't get any other way.

    However, this bit really stuck out: "vulture-feeds weighed in at 368.9 MB". Liferea, which does far more, is "594.9 kB on disk" according to Ubuntu Software Centre. That's right, less than 600 kB. Electron is mind bogglingly huge.

    Nearly all that sites that I read regularly I monitor via RSS. If a web site doesn't offer an RSS feed, then it may as well not exist so far as I am concerned. I read the articles in the web browser, but RSS is where I find out that the article exists.

    I think that much of the source of the problem with "fake news" is that too many people seem to get their news spoon fed to them from Facebook or Twitter instead of getting it directly from reputable news sources followed via RSS. RSS is decentralised, which also keeps any one company from getting a choke hold on the supply of information. That of course is why the companies who do want a stranglehold on the web don't like it.

  11. JDX Gold badge

    It should not display images at all, because they take up screen space and slow load times.

    Written just below a large photograph which added no content :)

  12. John Smith 19 Gold badge
    Unhappy

    Cross platform development frameworks are a thing again

    Back from the days whent eh list was Windows/AppleOS/OS2/*nix

    And with pretty much the same strengths and weaknesses.

    As usual I'll be there are options to slim down the footprint, possibly by sharing core functions across multiple apps, so the first is 350MB+, but the rest are radically smaller, as they piggy back the first load.

  13. DCFusor Silver badge

    MacOS Native?

    Funny, the wikipedia page for sublime text lists it as cross platform and the first opsys listed is Linux, not MacOS. Just FYI...

    It works darn well on Linux here...It's the one app I've found really worth paying for.

    John Smith has it right, at least from a Linux perspective, more and more of the big apps are now Linux capable. And if I read my process table right - many are using the native libraries and it's on Windows that they'd be extra big due to being the first to need X or QT there. Things are looking up.

    Of course, Java is crummy everywhere. I agree with Linus on that one.

    Java is a horrible language

    @adair - I think definition #2 fits better: 2. of little or no practical value, meaning, or relevance; purely academic: In practical terms, the issue of her application is moot because the deadline has passed. ;~}

    1. mrtom84

      Re: MacOS Native?

      I used to use it, and had paid for it, until the gem that is VS Code came along.

  14. Anonymous Coward
    Anonymous Coward

    Slow news day guys?

    Or will we one day have reg.js depending on (recursively) 120 other ".js"s with elreg.js in there somewhere because taken names are a problem?

  15. Anonymous Coward
    Coffee/keyboard

    Flash for the Desktop

    I'm surprised nobody's linked this yet. Browser apps should run in THE browser. Embedding a modern browser in each app is a recipe for epic bloat.

    https://josephg.com/blog/electron-is-flash-for-the-desktop/

    NPM's a clusterf*ck too:

    https://www.theregister.co.uk/2018/02/23/npm_undoes_patch/

    https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/

    1. Anonymous Coward
      Anonymous Coward

      Re: Flash for the Desktop

      Sssssh. The VS Code zealots will be along soon to tell you it is lightweight.

  16. ChrisElvidge

    I use Thunderbird for everything rss related

    But the major gripe I have is with Cloudflare getting between me and the feed url https://www.theregister.co.uk/headlines.atom

    This happens irregularly, but regularly enough for me to have found "phantomjs" to get round it.

  17. Elledan
    Coat

    Call me old, but...

    One of my current laptops which I frequently use dates back to 2007 (AMD X2), and it's zippy enough for just about any task. The only things the poor thing struggles with (even after doubling the RAM to 4 GB and adding a 7,200 RPM HDD) is anything involving lots of JavaScript and of course Java.

    Running something like TweetDeck in a browser tab is a great way to bring Linux to its knees. Running an Electron 'app' next to the browser is a ridiculous proposition.

    As a primarily C/C++ dev, I would have written an app like this in something like WxWidgets or Qt, have it cross-platform as well, and use less than 20 MB of RAM.

    Now get off my darn lawn.

  18. Christian Berger

    There's also Lazarus...

    it also allows you to do cross-platform applications, but without coming with a browser. So essentially you'l get a 1 megabyte (still huge) statically linked binary out of it. You can even create your interface by clicking around.

  19. Jeffrey Nonken

    "Professional developers tend to scoff at Electron apps, and rightly so."

    Balderdash. Jewellers' drivers and loupes are good tools, but for some jobs a hammer works fine. Script languages exist for a reason.

    I work in C and assembly on embedded systems, but I'm not a snob. I figure any tool that works for you is probably good enough, and I'm unlikely to scoff or sneer just because it's not the one I would have chosen.

    Oh sure, I might if the tool is wildly inappropriate, but that's not the case here.

    I guess I'm just a dilettante.

    1. Christian Berger

      Well but adding a browser to your project and using web technology for GUI-Applications is kinda the worst you can do. The web was never made for Applications therefore people have to do strange things to get those working.

      The only reason why one would ever do a thing like that is because they just don't know anything else. Just like companies who rely on VBA for their processes.

      1. Anonymous Coward
        Anonymous Coward

        In fairness, the web isn't a bad choice for a business database app running on a central intranet or internet server. Certainly better than VBA or, worse, the ever-popular browser-based Java client. Bundling your client app up with Electron might even be a good choice if your users insist on using crap browsers & extensions.

        What are the alternatives? Everything I can think of is C++, which is tough to justify for these sorts of apps. I wish there was a better alternative. The web ruined everything.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020