Anthony Tugman
E534
9/21/20
Assignment 3

The Square Kilometer Array (SKA) is a global, cross-disciplinary partnership striving to produce the world’s largest radio telescope. As the name correctly describes, the field of sensors will stretch across a full square kilometer and be located across western Australia and south Africa which are considered to be some of the most remote areas in the world [1]1. As proposed in 2018 the SKA’s goals include studying the various types of energy that exist in the universe, deepening the understanding of gravitational waves, studying the birth of the universe, and searching for extraterrestrial communication. The developers of SKA have also made clear that this radio telescope will not only be larger than any in existence, it will also collect, analyze, and store data in speed and quantity that has not been seen before. The project is in the design phase and is constantly evolving to overcome technological and legislative barriers. As of September 2020 the SKA had received final approval and is preparing to enter the construction phase [2]2.

The future achievement of this project is highlighted by the amount of the data processed, as well as how the data is being processed. While still in the design phase the global collaboration is attempting to predict any problems that may occur and plan accordingly. SKA must use custom designed hardware to handle the amount and speed of data processing necessary for this project to be successful. The matter in which this data is to be processed is entirely different then existing systems. Instead of storing the data collected for processing, the SKA will process data in real time and only store the results. In doing so, the intermediate data is disposed of after the processing process has been completed. Even in doing so the SKA will produce 700PB of usable data annually between the two telescope arrays. This scale processing will require 2 supercomputers each 25% more powerful than what was available in 2019, as well as broadband capabilities 100,000 times faster than what is expected to be available in 2022 [1]1.

In the scope of this course the big data challenges this systems construction creates is most important to focus on. Undoubtedly a custom software and computing system must be developed. Although the data storage pipeline was previously described, it is important to note that small research stations across the globe will communicate with the large data center for each field of telescopes. To accomplish this task the designers expect to use various computer systems including custom PCBs, FPGAs, and commodity servers. The SKA team hopes to piggyback on existing infrastructure in the host countries to avoid starting from the ground up [3]3. The first big data challenge stems from the volume of the data that will be collected and stored. In the use case data was to be collected at 700 petabytes per year, stored indefinitely (approximately 50 years) [3]3. The velocity of the data being transferred is staggering as well. In this application the data is time dependent and needed at a constant pace for computation and refining. As proposed, data will be transferred at approximately 1TB/sec [3]3. According to the use case, there is not much variety in the type of data that will be transferred. In most cases the data will be in the form of images produced by the telescope with attached time data. If the researchers at the satellite office need access to other types of data, or more raw data, they will be able to do so through a virtual protocol as the data will not be stored on site [3]3. The project designers seemed less sure in the variability of the data. In this case variability differs from variety as it refers to variation in the rate and nature of the data gathered. Here the variability was described as occasional, most often in the case when the telescope is changing the type of observation it is performing. It is also noted that there may be variability when the system is subject to certain strains, such as during system maintenance [3]3.

Additionally, the project designers stressed maintaining the quality of the data as well as building a troubleshooting pipeline for when errors occurred. All systems of the SKA will store remnants of data and proper logs to insure the integrity of the data. This concept is true not only for the raw data being collected, but also for all other processes including the telescope equipment, sensors, and transmission equipment. Overall, the data will be verified by a combination of physical human checks and automated checks by the computer [3]3. Unlike the previous sections the project designers did not make entirely clear the type of data they expect to receive after all collection and processing has been complete. Out of curiosity I did a quick web search to see what data is typically generated by radio telescopes and it is a variety. In any case, the project designers are confident that their system will be able to handle any requests/queries the researchers make. As a final consideration, the quality and security of the data must be preserved. In this situation the SKA governing board seems most likely and appropriate to manage the data generated. The project designers point out that the data is initially available only to member agencies for the first 18 months, after the data is released to the general science community. In certain situations or countries however, the data is subject to ruling by the local governing board, and the SKA governing board will comply appropriately [3]3. In conclusion of the analysis of big data problems faced by the SKA project, it seems apparent that they are common to most projects involving big data. Primary concerns arise from data collection, storage, integrity, and security. Through analysis of the provided use case it is clear that the SKA project understands many of the roadblocks that they must overcome, however in many situations they did not spell out a clear plan or solution. Many of their provided responses were vague but I am confident that with a coalition of scientists across the international community in support, the team will be able to reach their goals.

The science goals of the SKA project were briefly mentioned, but it is worth looking into each in more detail. The first use case is research into the evolution of the galaxy, dark energy, and the rapid acceleration in size of the universe. To accomplish this goal, the SKA will monitor hydrogen distribution throughout the galaxy, specifically at the edge of the known galaxy in hopes of watching new ones be born. Initially discovered in 1930 by Karl Jansky, hydrogen can be monitored as it returns a specific frequency back to the radio telescope. The SKA has the added feature of being able to see further and more precisely to gather a better understanding of what is occurring in the galaxy [4]4. In this section, the project designers make a careful clarification into what is meant by the sensitivity and resolution of the SKA. Sensitivity is defined as the measure of the minimum signal that a telescope can distinguish above background noise. The SKA sensitivity comes from the combination of radio receivers at the low, mid, and high range frequencies combined to effectively create a single radio telescope 1km wide. Resolution is the measure of the minimum size that a telescope can distinguish, or the cutoff when the telescope produces a blurry image compared to a clear image. The receivers of the SKA are relatively spread out which helps to increase the telescope’s resolution [4]4.

The second use case is further exploring the magnetic field that exists in our galaxy. Magnetic fields are entirely invisible, even to the largest telescopes, so instead the researchers look at various concentrations of radiation. In this section the researchers make an interesting note, they can anticipate the performance of the telescope in studying the known world however they expect that the SKA will open new research questions never before considered. As in the previous use case, the SKA’s resolution will allow researchers a view not previously achieved. The overall goal is to develop a map of the magnetic field through the known universe [5]5.

The third use case is to study the cosmic dawn. With our current technology, such as the hubble telescope, researchers have only been able to study the first 300,000 years after the big bang. By studying the cosmic microwave background of this time period they are able to get a better understanding of how the universe developed. However the next half billion years gives an even better insight into the scale of the structures created as well as how they began to form and collapse under gravity. The SKA will allow researchers to see into this time period. For this use case, the sensitivity of the SKA is most important [6]6.


  1. ]“The SKA Project - Public Website”, Public Website, 2020. [Online]. Available: https://www.skatelescope.org/the-ska-project/. [Accessed: 24- Sep- 2020]. ↩︎

  2. “SKA completes final reviews ahead of construction - Public Website”, Public Website, 2020. [Online]. Available: https://www.skatelescope.org/news/ska-completes-final-reviews-ahead-of-construction/. [Accessed: 24- Sep- 2020]. ↩︎

  3. “Use Case Survey_SKA”, Google Docs, 2020. [Online]. Available: https://docs.google.com/document/d/1ZMrga5R_idBcFlhvvhlcOP9aX--bpqhJg4XSS3Qi3ws/edit. [Accessed: 24- Sep- 2020]. ↩︎

  4. “Galaxy Evolution, Cosmology and Dark Energy - Public Website”, Public Website, 2020. [Online]. Available: https://www.skatelescope.org/galaxyevolution/. [Accessed: 24- Sep- 2020]. ↩︎

  5. “Cosmic Magnetism - Public Website”, Public Website, 2020. [Online]. Available: https://www.skatelescope.org/magnetism/. [Accessed: 24- Sep- 2020]. ↩︎

  6. “Probing the Cosmic Dawn - Public Website”, Public Website, 2020. [Online]. Available: https://www.skatelescope.org/cosmicdawn/. [Accessed: 24- Sep- 2020]. ↩︎