Parallella, Raspberry Pi, FPGA & All That Stuff

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6592
Credit: 329852232
RAC: 294883

Aha. A single post (

Aha. A single post ( 01/04/2017 ) on the Parallella forum by Mr Olofsson makes it pretty clear :

 

Quote:

I am no longer an employee of Adapteva (now at DARPA). Unfortunately I can't be involved with Adapteva day to day operations, but this was the status when I left:

  • Adapteva the company will stay open indefinitely
  • Ola Jepsson is still actively involved with supporting Parallella
  • Parallella (the board and software) is open source and will live and die with the community. Adapteva will continue to be a member of that community.
  • Epiphany processor/NOC IP has been available for licensing since 2011 for folks who want to build their own chips.
  • Parallella boards will continue to be sold through distributors (approx 3,000 built boards in stock, 25,000 Epiphany-III chips in stock)

I will write up a blog post to set the record straight at parallella.org as well.

Rather like the Roman Emperor Honorius, in a letter to the Britons, when the Anglo-Saxons turned up : "You must look to your own defenses" ... :-((

He obviously got a better offer. Which sort of slightly leaves open the question of whether Adapteva, sans Oloffson, will produce the Epiphany V. I can't find any relevant statements by Mr Jepsson and nothing to see here. I'd have to say it doesn't look good. I think the parrot is dead. Screw it. I had some good ideas.

Cheers, Mike. 

( edit ) "... IP has been available for licensing .... for folks who want to build their own chips". LOL.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

I picked up an Asus Tinker

I picked up an Asus Tinker Board today from the local Microcenter ($60).  Quad core 1.8Ghz ARM7 32bit w/ 2gigs of RAM.  Same form factor as RPi.  Some observations - runs much hotter than the Pi - 30C just idling WITH the heatsink.  I have simply done an apt-get install boinc and am up and running.  You can track the performance of this guy here:  https://einsteinathome.org/host/12523666

I need to dig out the posts about calculating your own wisdom for FFT, as this thing would not have that available in BOINC/E@H.  Also, BOINC doesn't see the GPU which is supposed to be a fast MALI.  Not smart enough to see if it's possible to get this to work.

So far I'm 17 minutes in and already 4.5% done on all tasks :)

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6592
Credit: 329852232
RAC: 294883

Now that you mention

Now that you mention tinkering :

I'm having another look & play ( aka code & pray ) upon my Nexys-4 Trainer Board ( an ageing design superseded by it's DDR2 upgrade which is currently going for $480 AUD ). The embedded FPGA has much of interest and comes with superb free design tools ( aka WebPack ) from source code right through to loading the array via a JTAG port ( read : programming cable adapter ).

{ Well there is a minor screw about to get that suite to work on Windows 8 vs Windows 7. Plus one has to use an older hardware interface software ( read : device driver + GUI called Digilent Adept ) to handle the transfers, initialisation etc. }

Now you may well ask : why would I ? One initial gag with the Parallella was that one would get a really cheap FPGA + ARM-A9 + RAM etc ie. a system one could use quite well for many reasons and totally ignore the Epiphany chip ! I recall that Mr Olofsson didn't see the joke. But it remains true that the Zynq Z7010/Z7020 chip within eg. the E16G301 Parallella board, has an Artix-7 FPGA variant from Xilinx, that being the same type as for the Nexys-4 but with substantially less mojo ( only about 25% of the Configurable Logic Blocks ). My trainer board has the XC7A100T Artix-7.

Of extreme interest is that both Nexys-4s, with & without DDR2, have 240 independent Digital Signal Processors units or 'slices' available. The short answers are :

- one could program them to produce fused-multiply-add sequences ie. the inner product of a vector of twiddles with a partial/temporary DFT/FFT sub-vector. Using two concurrent DSPs one may manage single precision float operands, operating with carries et al managed properly ( b/w the lesser & more significant components ). It's reachable and notably can input 2 x 18 = 36 bit 2's complement, has even longer intermediates, which means one can be way cool/safe about rounding until the very end ( that's the fused bit ).

- one doesn't have to panic in the extreme to design, debug, screw-up, test, forget exceptions, etc ..... in order to create floating-point arithmetic ability. Disasters will conveniently come quicker and faster. There will be shorter turnaround on the flaming wreckages. Thus merely mid-to-high range panic will do nicely ..... Epic ! That allows the focus to be upon data-flow elements/synchrony and RAM access.

- speaking of which the Nexys-4 DDR2 is provided with "a VHDL reference module that wraps the complexity of a DDR2 controller and is backwards compatible with the asynchronous SRAM interface of the CellularRAM, with certain limitations." This means in particular that one doesn't have to worry about any time-domain crossing circuitry for this aspect. Double Epic.

- One gets 128 MB of such RAM. That is eight times what is needed for both input and output data vector storage ( purely real operands ) for any 222 point FFT much beloved here at E@H. Triple Epic.

- AND ONE ALSO GETS* TO FIT IN ALL THOSE 222 TWIDDLES AT SINGLE PRECISION, BUT NOW PRE-CALCULATED ONLY ONCE FOR AS MANY FFT INSTANCES AS ONE SUBSEQUENTLY PLEASES. !! Quadruple Epic.

- get the data vectors on and off the board via USB or Ethernet. This would likely need bespoke host-side software. Needs thought, but might submit to some cheating ...... :-)

Cheers, Mike.

*  a set of steak knives and a handy clothes brush if you call 1800 SUXTOBEU with your credit card in the next 15 minutes

( edit ) The source code of my choice is VHDL, an acronym containing another acronym : Very High Speed Integrated Circuit Hardware Description Language. VHSIC is more an historical than interesting term. One ought read everything backwards : a Language for the Description of Hardware that is of Very High Speed Integrated Circuit type. Crucially one designs by description ie. tell the tools what is being attempted in a behavioural/structural fashion. It is strongly typed and suffers no ambiguities. Not tedious but exact. As always, Modularity Is King. The tools do all the low level BS in accordance with any chosen constraints ( timings, external connections etc ). If one is too ambitious for a given target/implementation/constraints these tools will readily inform thee. The tools know the target FPGA in fantastic detail and the various algorithms that map, place and route are written by the best in the business, and can engage much inferential logic to keep one out of the digital mud. Even the simplest designs are perused and the returned messages give some really cool tips ( of course one may ignore them if one already knew the tip. LOL, as if .... ). At the VHDL level it comes down to deciding and thus describing what ( possibly compound ) logic elements are to be be connected to what. One can catch a break if something of interest is already in the public domain, and possibly adaptable, compared to some quite expensive intellectual property. With the WebPack there are three major opportunities to simulate ( read : find DimWitted Mistakes ) as the design trends towards final implementation. Synchronous character comes in when state may be remembered ie. flip-flops/latches that hold output, so counters ie. clocks may exist. When clocks exist then times for which registers may be assumed to have valid data ( as opposed to catching them during transitions or merely waiting for some sequence point to be concurrently reached ) are realisable. The boards have many accessible oscillators ( up to 480Mhz ). BTW : Concurrent means parallel. Curiously, and potentially anti-intuitively, it doesn't mean sequential when considered at the description stage of design. It doesn't mean instantaneous either. It doesn't mean concurrent pathways finalise simultaneously even if they start so. The tools, thank heavens, know all the gate and signal path delays.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Anonymous

KF7IJZ wrote:I picked up an

KF7IJZ wrote:
I picked up an Asus Tinker Board today from the local Microcenter ($60).  Quad core 1.8Ghz ARM7 32bit w/ 2gigs of RAM.  Same form factor as RPi.  

I'm currently looking into this - an Odroid XU4 (http://ameridroid.com/products/odroid-xu4)

odroid w/usb3.0

mostly because I have a usb3.0 camera and this was the only sbc I could find in "this size" that provided BOTH usb3.0 AND usb2.0

I have two odroids (ending in numbers 284 and 584) that are turning in some good numbers for SBCs.  These boards are running user:  N30dG's app as are all of my pi3's.   I would imagine this Odroid XU4 would be a close performer also.    

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

First results are coming back

First results are coming back around 19000 seconds on the Tinker Board.  As a test, i also installed the highly optimized version of the app on one of my Pi 3s.  It's returning results in about 21000 seconds.  Starting to look in to whether or not there's a way to compile a highly optimized version of the BRP4 app for the Tinker Board.

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

KF7IJZ schrieb:I picked up an

KF7IJZ wrote:
I picked up an Asus Tinker Board today from the local Microcenter ($60).  Quad core 1.8Ghz ARM7 32bit w/ 2gigs of RAM.  Same form factor as RPi.  Some observations - runs much hotter than the Pi - 30C just idling WITH the heatsink.

I should get one, too.

KF7IJZ wrote:
I have simply done an apt-get install boinc and am up and running

This is nice to hear!

I am collecting IT-talented folks that would like to help preparing the Debian packages not only for the BOINC client but also for the scientific applications like the ones of E@H. For instance there is already a very nicely performing  and easily installable boinc-app-seti package in Debian.

The advantage is that regular users can easily recompile for the CPU they have, while regular clients are likely to be forced to make some sort of a compromise. But frankly, even though we have the boinc-app-seti package for a while now, the number of installations reported on https://qa.debian.org/popcon.php?package=boinc-app-seti (expect numbers for Ubuntu some factor 10 higher) is not so ultimately impressive when compared with the absolute number of Linux contributions. But for the less mainstream platforms that fraction I presume to be above average. Anyway, if there are folks out there reading this who feel positive towards maintaining a software package in Debian that constitutes E@H, please contact me. It would need basic programming skills, I can help with the Debian-specific bits.

KF7IJZ wrote:
You can track the performance of this guy here:  https://einsteinathome.org/host/12523666

A BananaPi R1 (with a lame SD card, admittedly) here: https://einsteinathome.org/de/host/12523840   .  Your Tinker completes tasks three times faster and the BananaPi  only has two cores vs four of the tinker.

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

steffen_moeller wrote:I am

steffen_moeller wrote:
I am collecting IT-talented folks that would like to help preparing the Debian packages not only for the BOINC client but also for the scientific applications like the ones of E@H. For instance there is already a very nicely performing  and easily installable boinc-app-seti package in Debian.

I think this is a great idea!  To take it one step further, I would love to see platform optimized versions.  For instance, the work that N30dG did on the E@H BRP4 app to tune it specifically for the RPi 3 has it returning results in half the time.  I made the effort to put his software on my Pi 3 farm this weekend so I have doubled throughput (processing per WU went from 40Ksec to 20Ksec) without spending a dime!

I am interested in trying to do the same thing for the new Tinkerboard as I believe that there are gains that could be made there as well.  Unfortunately I don't have the experience to be able to do this directly and will be trying to glean a lot from the forum and will be asking a lot of questions throughout.  I would love to get to the point that there is a guide on HOW to compile platform optimized versions of the app regardless of what ARM board you're using.  Not all boards have the level of Raspberry Pi support/community so I would expect that it's up to us to optimize for our Hardware.  The Tinker Board has a better chance of having legs than a lot of other boards because it's produced by a major vendor but they don't seem to have launched a community around it like the Pi foundation did.

 

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

steffen_moeller
steffen_moeller
Joined: 9 Feb 05
Posts: 78
Credit: 1773655132
RAC: 0

KF7IJZ

KF7IJZ wrote:
steffen_moeller wrote:
I am collecting IT-talented folks that would like to help preparing the Debian packages not only for the BOINC client but also for the scientific applications like the ones of E@H. For instance there is already a very nicely performing  and easily installable boinc-app-seti package in Debian.

I think this is a great idea!  To take it one step further, I would love to see platform optimized versions.  For instance, the work that N30dG did on the E@H BRP4 app to tune it specifically for the RPi 3 has it returning results in half the time.  I made the effort to put his software on my Pi 3 farm this weekend so I have doubled throughput (processing per WU went from 40Ksec to 20Ksec) without spending a dime!

Thank you for your positive reply. I have seen your youtube video on setting up an RPi farm [1] - well done!

We should indeed think about having different versions for ARM, even though I am not completely confident about how to best get towards such an ARM-Akosziation [2]. But once we can show some momentum, I am confident we will get all the directions we ask for.

For SETI, Debian offers just two versions - one with graphics enabled (and the dependencies dragged in) and the other plain headless. One could probably think of something more diverse for the ARM product portfolio. I am just uncertain about how much Debian should care about it vs some auto-detection by the client.

[1] https://www.youtube.com/watch?v=KJKhRLKXr-Q
[2] https://einsteinathome.org/content/there-any-reward-akos-fekete-regarding-his-optimizations

KF7IJZ wrote:
I am interested in trying to do the same thing for the new Tinkerboard as I believe that there are gains that could be made there as well.  Unfortunately I don't have the experience to be able to do this directly and will be trying to glean a lot from the forum and will be asking a lot of questions throughout.  I would love to get to the point that there is a guide on HOW to compile platform optimized versions of the app regardless of what ARM board you're using.

The instructions are mostly here https://einsteinathome.org/de/application-source-code-and-license and I had it already compiled the LALsuite if I recall correctly. From there into Debian it is mostly
* separate the common bits already in Debian from the bits that are unique to E@H and should then be in newly contributed Debian packages
* decide what is of general value and likely already living as a sub-package on its own which should then also be packaged independently
* automate the build process and the installation

We have the "experimental" section of Debian to toy around. But for the first upload to the real thing ("unstable") and subsequent backports to what the world is using today, we need to get above points right.

KF7IJZ wrote:
[Not all boards have the level of Raspberry Pi support/community so I would expect that it's up to us to optimize for our Hardware.  The Tinker Board has a better chance of having legs than a lot of other boards because it's produced by a major vendor but they don't seem to have launched a community around it like the Pi foundation did.

Should it happen that we ever get E@H packages into Debian, then we have auto-covered the the RPi 3, the Tinker Board and whatever runs Debian. However. the cross-platform compatibility of the boinc-apps provided with Debian to me is just a trigger. What keeps me going is the extra communication between those astrophysicists and the likely not-so-average Joe software engingeer. This also includes ties with the electro-engineers who traditionally are closest to FPGA-based application acceleration for Bikeman. And I also see the Debian packages facilitate vacation/home office setups for the one or other LIGO researcher. And since I am mostly after that communication between ourselves, I want to have that from the beginning. Maybe someone knowing a bit more about what is happening next year (what E@H software will be the most invariant as a target, not featuring too many dependencies but nice to have as a community former) could descrive his/her PoV to consult us and set some milestones (what software, what external test data (if any), in what order).

I just created this Google Group https://groups.google.com/forum/#!forum/debian-package-for-einstein-at-home to get ourselves organized. Please all just join it if you think you can help and then we start with some introductions next week, I propose.

Anonymous

KF7IJZ wrote:steffen_moeller

KF7IJZ wrote:
steffen_moeller wrote:
I am collecting IT-talented folks that would like to help preparing the Debian packages not only for the BOINC client but also for the scientific applications like the ones of E@H. For instance there is already a very nicely performing  and easily installable boinc-app-seti package in Debian.

I think this is a great idea!  To take it one step further, I would love to see platform optimized versions.  For instance, the work that N30dG did on the E@H BRP4 app to tune it specifically for the RPi 3 has it returning results in half the time.  I made the effort to put his software on my Pi 3 farm this weekend so I have doubled throughput (processing per WU went from 40Ksec to 20Ksec) without spending a dime!

I am interested in trying to do the same thing for the new Tinkerboard as I believe that there are gains that could be made there as well.  Unfortunately I don't have the experience to be able to do this directly and will be trying to glean a lot from the forum and will be asking a lot of questions throughout.  I would love to get to the point that there is a guide on HOW to compile platform optimized versions of the app regardless of what ARM board you're using.  Not all boards have the level of Raspberry Pi support/community so I would expect that it's up to us to optimize for our Hardware.  The Tinker Board has a better chance of having legs than a lot of other boards because it's produced by a major vendor but they don't seem to have launched a community around it like the Pi foundation did.

 

At the risk of sounding repetitive/boring  N30dG's app is running on two of my Ordoids and ithey are leaving the Piz behind ~1242 - 1642 average credit (13Ksec per WU).  If I can be of help from a test perspective let me know.  

KF7IJZ
KF7IJZ
Joined: 27 Feb 15
Posts: 110
Credit: 6108311
RAC: 0

Howdy folks - wanted to share

Howdy folks - wanted to share an update about my Pi 3 stack.  At this moment, 6/8 of my Pi3s are offline.

microSD cards are the bane of crunching and have been the cause of EVERY node failure I have experienced over the last year.  I was in the process of identifying solid replacements that used the much more robust MLC NAND structure to increase reliability (Transcend High Endurance cards seem to be the winner of the longevity vs cost per the DashCam community) when I decided that I had had enough!

I decided to follow the booting tutorials from the Pi Foundation (https://www.raspberrypi.org/documentation/hardware/raspberrypi/bootmodes/README.md).  I started by getting my rpi3-0 node (the primary node in the stack responsible for DHCP and NAT for the local network).  I was only interested in trying HDD or SSD based USB solutions.  I was never successful with booting any HDD USB drive (I had several drives and 3 different interfaces).  I suspect that this is largely due either to power consumption requirements to spin the drive or that the drive just doesn't come online fast enough for the Pi.  What did work was a Samsung 850EVO mSATA drive (128Gb) inside a Sabrent USB to MSATA box (https://www.amazon.com/Sabrent-Enclosure-Adapter-Support-EC-UKMS/dp/B00LRZPNHM/ref=sr_1_3?ie=UTF8&qid=1493652302&sr=8-3&keywords=usb+msata+adapter).  The one gotcha I ran in to for this arrangement was that I had to remove the instruction from bootcmd.txt that auto extends the file system.  When it would do this, the Pi wouldn't boot from the USB drive any longer.  I was however able to successfully resize the file system manually.

Given my success here, I knew I wanted ALL my Pis to be without microSD.  I configured rpi0-3 to be a dnsmasq server and followed the Pi Foundation instructions for configuring TFTP and booting services.  I got to a place where I could boot a Pi over the network but it required an SD card w/ a copy of bootcode.bin on it - a vast improvement because the card was only being read and not written to.  This wasn't good enough though.  I want NO microSD cards in my setup.  In a fit of desperation, I ended up buying a Netgear GS108Ev3 switch from the local Microcenter so I could have a switch w/ port mirroring to try and troubleshoot the issue further.  With this new switch, my Pi booted from the network WITHOUT AN SD CARD!!!!!

So I have two Rpi3 nodes online - rpi3-0 booting from a USB mSATA drive and rpi3-7 booting from the image served by rpi3-0.  Next step is to get all Pis netbooting (there may even be a way to get the Pi 2s to boot using the card/bootcode.bin method mentioned above).  I will be doing a full and detailed writeup once I have the whole thing up and running.  I just wanted to pass on my exciting results so far!

My YouTube Channel: https://www.youtube.com/user/KF7IJZ
Follow me on Twitter: https://twitter.com/KF7IJZ

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.