Web

Table of Contents

1. The W3C

A body that “regulates” the web. It standardizes internet protocols, and accepts potentialy new ones from the general public.

1.1. The process:

  • Public expression of interest
  • If there’s enough interest, a group is ceated for the topic
  • The charter of the group is defined?
  • The group produces specifications and guidelines
  • An advisory comittee decides if or when it can be published as recommendations

2. The TCP/IP model

The model is made up of multiple layers

2.1. App

3. Internet architecture

ISPs pay for transit through other networks

It can be split up into levels

3.1. Tier 1 ISP

These guys manage th einternet operating infrastructure.

3.2. Tier 2 ISP

Local ISPs? Ziggo, etc

3.3. Tier 3 ISPs

4. Resource identification

4.1. HTTP/Hypertext

4.2. Hypercard

4.3. URI

This is a reference identifying an abstract or physical resource.

Can be a URL(ocator), URN(ame), or both.

Can really take any form, depending on context. For example, a physical address URI can be a postcode. In file storage systems, this can be a path.

4.3.1. URL

A subset of URIs that identifies resources by the primary access mechanism.

It’s a physical address that links you to a resource.

4.3.2. URN

Identifies a resource independent of its primary storage location.

The identifier stays the same even if the resource is moved somewhere else.

These are basically just names, they’re identifiers.

5. HTTP

Hyper Text Transfer protocol. HTTP is based on TCP with default port 80. HTTP is reliable and connection oriented.

Messages exchanged through HTTP consist of a header and a MIME-like body. MIME is an old school form of mail.

HTTP guarantess a response for every request.

HTTP implements the client-server architectural pattern:

5.1. Client

A program that submits requests.

5.2. Server

A program that receives, processes and sends a response to the client.

5.3. Origin server

A server where a given resource resides or is to be created.

5.4. Proxy

A program that acts on behalf of an origin server. A sort of stand-in.

With a proxy active, client sends requests to proxy, which then communicates with the origin server. The proxy may rewrite all or part of the message.

5.5. Gateway

Gateways act as intermediarys for another server, but unlike a proxy, it acts like the origin server and the client is none-the-wiser.

5.6. Tunnel

Basically just a relay. Can be considered as a sort of cable that connects two connections. The tunnel does not know the contents of the messages, it just sends them.

5.7. Connection management

  • HTTP/1.o: Connection terminated after exchange ended
  • HTTP/1.1: Connection terminated after multiple exchanges ended
  • HTTP/2.0: Parallel exchanges

5.8. Types of content negotiation

5.8.1. Server-driven

5.8.2. Client-driven

5.9. Protocol nesting

Internet packets cane be made out of multiple layers, outside to inside:

5.9.1. Ethernet packet

Device to router communication.

Connection via a physical MAC address.

5.9.2. IP packet

This conencts two devices via IP.

5.9.3. TCP Packet

Connects via port

5.9.4. HTTP packet

5.9.5. HTML

6. C hashtag

C# is an object oriented programming language used with the web development backend. It is similar to Java and c.

In C#, types don’t need to be declared.

var text = "Hello world!"

Instead of getters an setters, we can just do:

//Class may be wrong
public class Car {
    public string Model {get; set;};
}

Car car = ...;
car.Model = "Mazda CX5"

You need to use .NET with C plus plus plus plus. What is .NET. A library which is very feature rich.

7. JavaScript

Another language used to develop website backends.

ECMAScript is the specificaiton, JavaScript is the implementation including APIs

7.1. TypeScript

Strongly typed Javascript.

function shout(): void { //Typescript adds the void
    let text = "Hello world!";
    console.log(text);
}

shout();

8. REST

REpresentational State Transfer. This is an architectra style for building large scale distributed hypermedia systems.

REST is essentially a set of constraints on system architecture:

  • Resource identificaiton (RI) (URIs)

    Name everything that you want to talk about (every resource) (what?) Everything will get a name as long as I want other parties to know about it.

  • Uniform interface (UI) (HTTP verbs)

    The same set of operatoins applies to all rsources (?) study up on this.

    The uniform interface is made up of CRUD:

    • Create
    • Retrieve
    • Update
    • Delete
  • Self Describing Messages (SDMs)
  • Hypermedia as the Engine of Application State (HATEOAS)
  • Stateless Interactions (SI) State is moved to clients themselves.

REST tries to ensure the following goals:

  • Scalability

    The ability for a system to cope with an increasing amount of users.

  • Simplicity

    Ease of use for any kind of user.

  • Data independence

    Data of a single resource may be in different formats, users choose this representation.

  • Performance

8.1. Maturity model

What?

8.1.1. Level 0

8.1.2. Level 1

8.1.3. Level 2

8.1.4. Level 3

Eveything that you do has to have links

9. API design

An API is the set of signatures that are exported and available to the users of a alibrary or a framework to write their applications. An API should also always include statements about the program’s effects and/or behaviours.

This is basically a set of rules which tell you how you can develop an application (?)

For example

*

10. Java Spring

Java Spring is a backend web development framework providing us a way to construct RESTful APIs.

10.1. Integration with MongoDB

The assignment requires us to query various parts of a dataset. What we’ll do is we’ll convert this dataset into a MongoDB database.

11. Review questions

If you’re able to answer all of these questions, you’ll have no trouble passing the exam.

-The lecturer

11.1. Lecture 1

  • How are the terms HyperText and HyperMedia defined?
  • What initiatives and tools are considered the precursors of the web?
  • What are the steps and stakeholders involved in W3C’s standardization process?

11.2. Lecture 2

  • What do the acronums ISP, POP, and IXP stand for and what is their function?
  • How are the terms URI, URL, and URN defined, what is their purpose?
  • What is the difference between a proxy, a gateway, and a tunnel server?
  • Under which conditions are HTTP requests safe/idempotent? Which HTTP methods are considered safe/idempotent?
  • What are the differences between PUT and POST in terms of request URI semantics, and pragmatics (i.e. how they are to be used)?
  • Which proxy types are common on the Web, and what is their function?
  • What are the benefits of caching?
  • How is the principle of Semantic Transparency for HTTP defined? Under which conditions is it violated by the use of caches?
  • How can a server stop a proxy or client from caching a response?
  • How does the Basic authentication scheme for HTTP work? Under which conditions should it be used?
  • How do CDNs work and what benefits do they offer?

11.3. Lecture 3

  • Which constraints define the REST architectureal style, and how are the related to each other?
  • How are the REST style constraints related to its goals?
  • What are the maturity levels in Richardson’s model? Which REST principles are they related to?
  • What are the principles that should govern the design of RESTful APIs?
  • What steps should be followed for the design of a RESTful API according to Masse’s book?

11.4. Lecture 4

  • What is the difference between text elements with semantics and those without in HTML? Which ones are recommended to be used, and why?
  • How was embedded video and audio handled up to HTML5? How are the handled by HTML5? What are the implications of this mechanism?
  • What are the types of HTML form validation available, and when are they to be used?
  • How is the cascade mechanism used to resolve conflicts between CSS rules?
  • Under which conditions would you recommend the use of an SPA-style application?
  • What options are available for creating server-side dynamic web pages?

11.5. Lecture 5

  • What are the funcitonalities encapsulated in each of the application layers in the respective pattern by Fowler?
  • What is the relation between layers and tiers? Which architectural decision is imporant for this decision?
  • Where does the application split lie in the case of remote presentation/distributed/remote data applications? Name examples of such types of applicaitons.
  • How is the MVX pattern defined for JavaScript-using web applications, where X = {C,P,VM}?
  • What are the main constituents of a service in SOA, and what is their purpose?
  • What are the main interaction roles in the SOA triangle, and what operations are they supporting?

12. Review answers

12.1. Lecture 1

  • HyperText: HyperText is text with links embedded in it.

    HyperMedia: HyperMedia is a form of HyperText not constrained to being text. It can take the form of pictures, audio, video (not inclusive).

  • Some precursors of the web:
    • The memex machine
    • The oN-Line system
    • Project Xanadu
    • NoteCards
    • Enquire system at CERN
    • The “Information management: a proposalde”
  • The stakeholders:

    • The ones who host the W3C
    • The users of the internet

    The process:

    • Expression of interest by memebers/the public
    • If interest is high enough, a new Activity or Work group is formed
    • The charter for the group is defined if necessary
    • The group produces the specificaitons and guidelines in cycles of review/revision
    • The advisory committee decides if/when they can be published as recommendations

12.2. Lecture 2

  • ISP stands for Internet Service Provider. An internet service provider is responsible for providing internet services to end-consumers. ISPs come in tiers, with tier 1 ISPs providing underlying network infrastructure.

    POP stands for Points Of Presence. POPs are locations where end-user sent network packets enter the network. A POP can for example be a box on the street connecting multiple houses to the local network.

    IXP stands for Internet eXchange Points. They are basicaly what the name says: locations where participant ISP providers exchange data destined for their respective networks. It can be thought of as sort of switch through which internet packets are routed through.

  • URI (Uniform Resource Identifier) This can be a combination of a URL and URN, or just one of them. It basically identifies a resource somewhere via the URL and URN. A URI can really be anything that identifies something.

    URL (Uniform Resource Locator) is a subset of URIs that identify a resource by their primary acces mechanism, for example, a filepath path. This can be for example an internet url, a filepath. A URL changes when the location of the resource changes. URLs are the most common form of URIs.

    URN (Uniform Resource Name) is subset of URIs that identify a resource independent of its primary storage location. This is basically some sort of identifier, for example, a name, a specific file. A URN does not change when the resource is moved.

  • A proxy acts on behalf of the origin server. The client sends data to the proxy, not the server itself. The proxy takes the data a user sends, may rewrite part of all of it, and then communicates the data with the origin server (unless it can be handled in the proxy).

    A gateway acts as an intermediary for the another server. When communicating with a gateway server, the client does intereacts with it as if they were interacting with the origin server.

    A tunnel is a simple link between two connections. The tunnel does not check or change the data being passed through.

  • HTTP requests are idempotent when requests do not produce any side effects. A request is idempotent when it returns the same thing everytime it is called. Examples of idempotent methods are GET, HEAD, OPTIONS, PUT, DELETE. HTTP requests are idempotent if doing the same thing multiple times has the same effect. These may alter server-side data, for example, DELETE. DELETE will delete something the same way every time.

    HTTP requests are considered safe when they do not alter state. Safe requests can be described as simple querries, with no obligation to do anything. Simple retreival requests. The HTTP options GET, HEAD, OPTIONS and TRACE are considered safe. Safe methods do not alter server-side data.

  • POST is used to create entirely new data. PUT is used to update data. A POST needs a place to be acted upon, a PUT needs a specific resource to act on.
  • There are multiple proxy types. Among them are:
    • Forward proxy

      A forward proxy acts on behalf of the client. A client sends requests to the proxy, which then sends them to the internet. This is useful for hiding a client’s IP, and bypassing imposed restrictions. Basically, a VPN.

    • Reverse proxy

      A reverse proxy intercepts data sent from the client via the internet, before it arrives to the server. A reverse proxy can be used for load balancing, protection against DDOS attacks, to protect server IPs, and for caching of static content, among other things. This is given that the proxy can handle the traffic.

    • Web accelerator

      Proxy with preteching and compressions, speeding up encryption, image manipulation, etc.

  • The benefits of caching are improved response times for requesting the same response message by storing it. Caching reduces netork bandwidth usage, reducing costs for providers and consumers. It decreases perceived server-side delays. It removes the load from the origin server, which reduces costs for the provider and allows for faster serving of non-chached resources client side. This seems a bit counterintuitive, but it makes sense. With the use of caching, more server-side resources can be allocated to retrieving non-cached requests.

    An issue with caching is cache-invalidation. This is when some request is cached, but the data changes server-side leading to the request being outdated.

  • The principle of Semantic Transparency refers to a fundemental HTTP design principle. The principle is defined by these rules:
    • The usage of cache must have no impact on client or origin server
    • Each request produces the same response as if the request would’ve been served by the server itself.
    • Deviations only tolerable on explicit request from client or server, or a warning must be produced
  • To stop a response from being cached, a server can set a past date as expiry data. This basically sets the cache lifetime to be 0, forcing it to be immediately outdated.
  • The basic HTTP authentication scheme works as follows:

    Credentials are transmitted unencrypted as user ID/password pairs in base64. This is very unsecure as this string can be intercepted and credentials can be easily extracted. It should only be used when security is not a concern. It should only be used over HTTPS connections to prevent interception.

  • CDNs (Content Delivery Networks) is a network of server strategically distributed across various geographical locations. It works by distributing content across multiple servers, routing user requests to the most optimal CDN server, delivering cached content if available, and updating content. Their benefits are reduces latency due to less network hops, scaling to demand, increased reliability, abstraction from data transfer, security against DDOS attacks. CDNs are useful for major websites ensuring users across the globe can access content at the same (ish) speed. CDNs also cache content in multiple locations around the world. This reduces the distance between user and content, and thus improves performance and reduces latency. CDNs also provide protection from large surges.

12.3. Lecture 3

  • The constraints that define the REST architectural style are:
    • Each resource in a RESTful system must be uniquely identified by a URI
    • Relies on a standardized usage interface through the use of HTTP verbs (GET,PUT,POST,…)
    • RESTful systems use standard media types like JSON or XML from which enough information can be obtained to determine how to process them
    • Hypermediaas the Engine of Application State (HATEOAS). This means that the client interacts with the app entirely though hyperlinks provided dynamically by server responses
    • Stateless interaction. Each request from the client must contain all necessary information needed to udnerstand and process the request. The server does not store any sessions state about the client

      They are related to each other in the way that they all try to achieve the same goal, a proper REST architecture.

  • The goals of REST are scalability, performance, reliability, visibility, and separation of the representation and the resources.

    The constraints above (somehow) work together to achieve all of these goals.

  • The REST maturity model comprises of four levels:

    • Level 0 comprises of using HTTP as a transport system for remote interactions. Plain old XML
    • Level 1 incorporates the unique identification of resources using URIs.
    • Level 2 includes the use of HTTP verbs correctly to interact with resources.
    • Level 4 integrates HATEOAS & SI hypermedia controls.

    They are related to the principles

  • - Information abstraction of a key element constitutes a resource
    • Resource representation is a sequence of bytes, plus representation metadata; the representation is negotiable
    • All interactions are context-free
    • Components can perform only a small set of well-defined methods
    • Idempotency of operations and representation metadata is encouraged
    • Presence of intermediries is promoted

12.4. Lecture 4

  • Elements with semantics are easier to read by web browsers and third party readers. Elements with semantics allow for easier understanding of the elements of a page, despite the elements potentially looking the same as those without semantics. It is generally recommended to use elements with semantics instead of those without them when available in the name of accessibility.
  • Up until HTML5, video and audio were handled by non-native web technologies such as Flash. This has since been replaces by native HTML mechanisms for media, such as <audio> and <video> elements.
  • Form validation is meant tio be used when client passes data to the server via a form. For some reason, the forms are not to be trusted.

    Validation can take two forms:

    • Client-side. This is done in the browser before submitting, through the use of JavaScript or by built-in form validation in HTML5.
    • Server-side. This is done thorugh the application/server. It’s less user friendly, but it’s supported by most server-side frameworks.
  • CSS resolves conflicts by using a cascade-type system. This system goes as follows, from highest priority to lowest:
    • Style attribute within HTML element
    • Element ID
    • Class
    • Overall elements style
  • The use of an SPA (Single Page Applicaiton) is recommended when you want to have a highly reactive web page with rich, complex, UI requirements.
  • Some options for creating server-side SPAs are:
    • Angular Universal
    • Next.js
    • Nuxt.js

12.5. Lecture 5

  • The presentation layer, the application logic layer, and the resource layer.
  • Tiers are physical organizational units, layers are funcitonal organizational units. (what)
  • The split generally occurs between the frontend and the backend. I don’t fucking know here.
  • Model View Controller, Model View Presenter, Model View ViewModel
    • MVC is simple enough
    • MVP has a presenter instead of a controller.

      The presenter retrieves data from the model and formats it for display in the view. Unlike the MVC controller, the presenter also handles most of the view’s output logic.

    • MVVM

      VM acts as an intermediary between the model and the view. It provides data binding between the view and the model. It handles most of the view’s display logic and its state.

  • The basic constituents of the Service Oriented Architecture (SOA) can be split into two parts:
    • Interface
      • Functionality visible ot the external world
      • Means to access this funcitonality
      • Self descriptive definition (easy to understand)
    • Implementation
      • Relizs specific service interface(s)
      • Multiple languages/platforms can be used
      • May use other services to implement functionality
  • The main roles of the SOA trianlge are the Provider, the Consumer/Client, and the registry. There roles are described as follows:

    • Provider
      • Organization that owns the service and implements the underlying business logic
      • The platform hosting and controlling access to the service
    • Consumer/Client
      • An organization requiring certain functionality to be satisfied
      • An application or service that uses the service
    • Registry
      • Searchable directory where services are described
      • Clients can “discover” suitable services and get all necessary information to use them

    The suported operations are:

    • Publish
    • Find
    • Bind

Author: Jay

Created: 2025-04-15 Tue 16:06