Text Formatting with HTML
Creating Web Pages with HTML5 and CSS3
Additional materials
Text Formatting with HTML
In the last lesson, we considered a number of tags that make up the basic structure of the html document and allow you to divide the text into basic blocks — paragraphs, headings, divs. As you know, these tags are not enough to create a full-fledged html-page, in which there are quite a lot of elements.
Today our task is to understand the classification of tags from the point of view of the space they occupy on the page and from the point of view of semantics — the logical structure of document markup.
Block and Inline Tags
We have already examined the differences between block and line elements in the last lesson. Let us return to this question once more, because it is important when marking an html document.
Block elements are such HTML elements, which by default occupy all available space inside the browser or parent element, even if their content is very small. Inline elements are such elements that take up as much space as there are text in them, and they are placed inside the block ones. It is important to note that line elements are always located next to each other if they go in a row in the markup of the document, and block ones are to be moved to the next line. In addition, block elements are intended for structuring a page, and inline ones are used to give a specific function to a text (a link, for example) or to format this text (<b> or <i>). Another difference between block and inline tags is that block tags always contain both an opening and a closing tag, and inline tags can consist of only one opening tag (for example, <br> or <img>).
From the standpoint of CSS, the differences between these two groups are specified by different values of the same css property: display: block (for block elements) or display: inline (for inline elements).
Also, let's define what is a parent element, or a parent container? This is the element in which the required tag will be nested. Most often this container is the <body> tag, because everything that should be displayed on the html page is placed in this tag. But, if we are talking about the inline tags <del> and <ins>, then most likely they will be placed in the <p> tag, which will be parent for them. Tags <del> and <ins> will be called nested, or child in this case.
Which elements apply to each of the categories?
Table 1. Block Elements
Tags |
Tag assignment |
html
|
Root tag
|
body |
Document body |
h1- h6 |
Header tags |
p |
Paragraph |
div |
Content block |
address |
Address |
article |
Article |
aside |
Side column |
nav |
Navigation on the page |
main |
The main content of the web page |
header |
The top of the page or section |
footer |
The bottom of the page, section or block quotation |
section |
Page section |
audio |
Insert audio |
video |
Insert video |
blockquote |
Block quotation |
caption |
Table header |
colgroup |
Defines the column group of a table |
table |
Table |
thead, tbody, tfoot |
To select row groups of a table |
form |
To create a form |
datalist |
Content for filling out the form field |
select |
Drop-down list |
fieldset |
Form field group |
legend |
Header for fieldset |
dl, ul, ol |
List tags |
canvas |
To draw graphics using JavaScript |
details |
To create an interactive element |
summary |
The visible part of the text of the details element |
figure |
The wrapper for the <img> element |
figcaption |
The header for the <figure> tag |
map |
Interactive map |
iframe |
Container for loading the contents of another html-page |
pre |
Preformatted text |
progress |
Progress indicator |
Table 2. Inline Elements
Tags |
Tag assignment |
A
|
Link
|
abbr |
Abbreviation |
b, strong |
Bold text |
i, em |
Italic text |
dfn |
Definition |
Br |
Line break |
button, input, textarea, |
The button, the text field and the text area — the form elements |
label |
Label for the form element |
cite |
Title of work, reference to source |
code |
To highlight program code |
kbd |
Code input from the keyboard |
samp |
An example of outputting something in a computer program |
var |
Variable in any program |
col |
Column in the table |
del |
Text that has been deleted |
ins |
The text that was added (often follow the <del> tag) |
S |
Strikethrough text, i.e. text that is not already correct or reliable. It's better to use the <del> |
span |
Container for css-formatting of a certain amount of text |
sub |
Subscript (H2O) |
sup |
Superscript (x2) |
small |
Smaller text |
img |
Inserting images |
mark |
Highlighted text |
meter |
Horizontal meter |
q |
Short quote |
I immediately make a reservation that today we will look at the examples of only a part of these tags, since we simply do not have enough time for everything. In addition, some elements assume the use of several tags at once. For example, <table>, <tr>, and <td> tags are required to form a table, but <caption>, <colgroup>, <col>, <thead>, <tfoot> and <tbody> tags can be also used inside the table. In addition to the <form> tag, the <input>, <textarea>, <select> and <button> tags are usually used to create a form, and each of these elements has its own features, which we will be discussed in a separate topic.
Today, we need to understand the differences between inline and block tags and see how browsers display them when formatting an html page.
Examples can be found in separate documents:
- Block tags;
- Inline tags.
Topic-related links:
- HTML Block and Inline Elements;
- HTML Inline and Block Elements;
- Difference between block elements and inline elements;
- Block and inline layout in normal flow;
- Block-level content;
- Inline-level content.
Tags of physical and logical formatting
At the beginning of the HTML development, new tags were added, which allowed, as a rule, visually changing the appearance of the text. These are <b> and <i> tags, which are used up to now and are already discussed, and which set bold and italic style of the text. There is also a <u> tag that underline the text, and the <s> tag strike it through, <small> makes the text smaller, and the <sup> and <sub> tags, respectively, convert the text to upper and lower indexes. Another tag — <pre> allows you to display text in the html-document in the form in which it was typed in a text editor, i.e. preserving all line breaks, tabs, and extra spaces.
We can say that all these tags physically change the appearance of the text on the html-page. Therefore, they are referred to the group of physical formatting tags.
In contrast to these tags there are logical formatting tags that also physically change the appearance of the text on the page, but their name contains, as a rule, a word or a part of it in English. For example, the <del> tag derives from the English word delete and indicates text deleted in the new version of the document or after some time. Visually, it displays the text stricken through. The <dfn> tag is derived from the English word definition, and is intended just for displaying any definitions. In the browser, it will be highlighted in italic text (Fig. 1).
Figure 1
In HTML5, there are many logical formatting tags that define the semantic structure of a document. These are tags such as the header — header of a page or article, main — the main part of the document, section — a section of the document, article — an article, footer — the bottom of the document or article. These tags were introduced to replace the divs with the appropriate classes or id and define the markup logic for the page. In this case, the word semantics, which is very often encountered when describing html-markup in various articles, indicates the correspondence of the internal markup of the html-document and its visual formatting. From the point of view of semantics, consider page markup and search engines, and screen readers — programs for reading web pages from the screen for visually impaired people.
For example, the <strong> and <em> tags, like the <b> and <i> tags, print the text in bold or italic, but also have a semantic meaning. The text enclosed in the <strong> tags has the increased significance of its contents, and the text in the <em> tags (emphatic stress) contains an emotional emphasis on this passage of the text. In ordinary speech, we emphasize such words by voice (intonation, loudness, etc.).
It should be noted that the HTML5 standard introduced semantic content for physical formatting tags. The same <b> tag, already mentioned repeatedly, should be used for text that the reader should pay attention to, but without increasing the importance of text or intonation as it is for the <strong> tag (Link to an article). As for the <i> tag, it is implied that it contains a text that is out of the general environment, but not emotionally colored.
It should be noted that quite a few logical formatting tags are not used too often and on sites of a certain subject. For example, elements <code>, <var>, <samp>, <kbd>, as a rule, are used on websites devoted to one or another programming language, and <abbr> and <dfn> are more likely to be found on reference sites, providing information on any course (chemistry, physics, law, etc.). It is very convenient for such elements to describe uniform formatting rules for the entire site in css-properties.
I think that at this stage it will be quite difficult for you to understand why all these "semantic difficulties" are needed, if the text visually looks exactly the same. Nevertheless, in the process of learning HTML as a hypertext markup language, you will understand that you should pay close attention to semantics and use predominantly logical formatting tags, because they are more important for SEO*, and for screen readers able to emphasize some of the logical tags by voice.
SEO (Search Engine Optimization is a series of activities aimed at making the site appeared on the first pages of search engines by certain search phrases or words. While the goal of the HTML course is not to promote the site, in fact it is very important to take into account some requirements related to SEO when designing the page, so that you do not have to re-structure the site later. Read about SEO:
- Wikipedia — https://en.wikipedia.org/wiki/Search_engine_optimization;
- Google for web-masters — https://www.google.com/intl/en/webmasters/learn/.
In order to view the use of both groups of tags, open the file tags_start.html. Ready-made markup is in the tags.html file.
In HTML5, there are also added elements that can be conditionally referred to as interactive. These are elements such as <details> and <summary>, which allow you to display hidden content by click, and <progress> <meter>, the use of which makes sense after learning the JavaScript language since it is best to manage them in this language. Look at the appearance of these elements in the file interective.html.
Deprecated tags and attributes
HTML, as the hypertext markup language, was repeatedly changed and had a number of standards, in which one tags were introduced and the use of others was abolished. HTML5 standard introduced many new tags, but some of those that were popular in previous versions of HTML were defined as obsolete or deprecated. Thus, the use of such tags is highly recommended in your document that supports the HTML5 standard, otherwise the document will not pass validation.
Let me remind you that the browser determines which standard your document belongs to by DOCTYPE, which is specified at the very beginning of the markup, and it looks like this for HTML5:
<!DOCTYPE html>
As for the deprecated tags and attributes it would be desirable to note that we will not consider them in detail in this course since our goal is to create a layout based on modern standards. Nevertheless, there are a lot of lessons on the Internet that were created long ago, in which these deprecated tags or attributes are used and recommended.
For example, to increase the font size, the <big> tag was used previously, which is now considered deprecated, but the <small> tag, which reduces the font size, is still used.
The same goes for the <strike> tag — it crossed out the text in HTML4, but it's out of date in HTML5. Instead, you can use the <s> tag or, better, the <del> tag, which indicates that the crossed out information is obsolete or not correct at all. The same goes for the <tt> tag, which was previously used to display text in monospaced font.
Another example: to denote the abbreviation, it was possible to use previously two tags <abbr> or <acronym>. In the HTML5 standard, only <abbr> remains, and <acronym> becomes deprecated. This was done due to the fact that <acronym> tag (a conventional abbreviation that is used as an independent word, for example, the UN or DOM, or IMHO often used in comments), is different from the abbreviation (the <abbr> tag) by sound, but from the point of view of HTML has the same meaning, so the need to use acronyms has disappeared. For example, in the documentation of developer.mozilla.org you can see this message:
Figure 2
There is another deprecated tag, which for some reason is incredibly popular among students starting to learn HTML. This is a <font> tag that allowed specifying the size, color and font family in previous standards. For example:
Figure 3
The HTML validator for this example produced the following error:
Figure 4
In fact, this tag earlier replaced the use of such css-properties as font-size, color and font-family. Please do not use this tag in your files. It is already so deprecated that its presence in your code is a direct indication that you do not know HTML, and you are simply too lazy to deal with css.
The same applies to attributes, which in earlier standards were used pretty often. For example, an attribute such as align had the following values: left, right, center, justify, and very often used to format text in headings, paragraphs, divs and even for text wrapping around images.
Due to the fact that all the possibilities of this attribute can be replaced with the css-property text-align or the property float for pictures, this attribute is deprecated.
Look at the list of deprecated tags and attributes in an article at html.com. A screenshot with a list of tags and attributes can be viewed below (Fig. 5).
Figure 5
If you look closely at this list, you can conclude that those tags and attributes have become deprecated, the need for which either has disappeared, or they can be replaced with css-properties.
Related Links:
- https://www.geeksforgeeks.org/html-deprecated-tags/;
- https://www.tutorialspoint.com/html/html_deprecated_tags.htm;
- https://developer.mozilla.org/en-US/docs/Web/HTML/Element.
Character Entities in HTML. Using Them in an HTML Page
You may have encountered characters in Word documents. They are usually not on the keyboard, but are inserted using a special menu item. For example, you cannot type the copyright character or the arrow characters (or). The same situation is in HTML: some characters can either be copied from the text in the same Word, or added using special codes. Character entities have 2 syntax options — in the form of parts of English words or numbers, which necessarily begin with the ampersand (&) and end with a semicolon (;). You can use any of the options, but note that without a named code character entities can be displayed not completely or be displayed incorrectly in different browsers.
For example, the copyright symbol can be represented as a character entity © or ©.
In the text editor you will see a set of characters, and on the html-page they will represent only one character.
You will find tables of various characters divided by category in the file HTML character entities. It should be noted that most of them, most likely, will never be used, but there are a number of characters, which should be known by heart, since you will apply them in different situations.
Copyright or Trademark Signs
For example, this is the same copyright sign (copyright), without which usually any site cannot do. Most often, it is placed at the bottom of the site (footer). For example, on the official website of the STEP Academy it looks like this:
Figure 6
Also, the trademark (entity ™ or ™) can be used next to the company name. For example:
Figure 7
However, this sign is used much rarer than the copyright character.
As for the Brackets, when you enter the initial characters of the entity, a hint appears, which contains both the complete code and its appearance on the page (Fig. 8).
Figure 8
You can see the example in the header-section-nav.html file in the examples folder.
If the company (enterprise) for which the site is made has a registered trademark, then you can indicate the sign (entity ® or ®) next to the name of this company (enterprise). For example, this entity is used on the Wikipedia site (Fig. 9).
Figure 9
Excerpt from the page code:
Figure 10
Note: To see the code of the page opened in the browser, you can press Ctrl + U keys or right-click anywhere in the page to select the View Page Code item in the shortcut menu (Fig. 11).
Figure 11
Using spaces and hyphens
There are situations where some phrases cannot be moved to another line by words. The most common situations are the surname and initials, the form of ownership and the name of the company, the name of the city or village. For example, A.F. Ivanov, Stroyinvest LLC, Kharkov. In this case, for their "gluing", the non-breaking space or   is used instead of the usual blank space. Compare the appearance of the text in the Figure 12.
Figure 12
Another situation is related to the need to add hyphens between parts of words, because the text block size may be too small to accommodate long words like "recommendations". While HTML does not fully support the automatic hyphenation by syllables, although this is a matter of the near future. Now if you need to break the too long word, you can put the soft hyphen signs (­ or ­) in the right places. They should be set in accordance with the norms of the language. The number of hyphens depends on the length of the word and your needs. The soft hyphens can be used after each syllable (Fig. 13).
Figure 13
Rec­om­men­da­tion
As for the automatic hyphenation, the css-property hyphens was introduced for this purpose. With the value auto when specifying the language in the lang attribute for the html tag, long words will be automatically hyphened according to the desired dictionary. For example, the text that we previously hyphened using the special character ­ can now be hyphened by specifying css-styles:
<html lang="ru"> ... <style> .hyphens { -moz-hyphens: auto; -ms-hyphens: auto; -webkit-hyphens: auto; hyphens: auto; } </style>
The downside of this method is that such hyphens are not supported by all browsers. For example, if in Firefox text appears with hyphens, then in Chrome we see «ragged edges» on the right side (Fig. 14).
Figure 14
See the example in the file entities.html.
In order to be aware of how this or that css-property is supported in browsers, you can make a request at the site caniuse.com.
In the Figure 15 you can see that the green color indicates browsers and their versions that support this property, red are those that do not support, olive means supported, but with some kind of condition (for example, Chrome started with version 55 only supports Android platforms or Mac).
Figure 15
The minus sign indicates that the property is supported with the so-called «vendor prefix»:
- -moz- for Mozilla FireFox;
- -ms- for Internet Explorer;
- -webkit- for Safari, Chrome, and other browsers based on Chromium — the Chrome engine.
The Use of Entities
There are a lot of character entities, but I would like to emphasize two of them, which are related to the html-code. For example, on such reference sites as https://www.w3schools.com/ or https://developer.mozilla.org/ there are lots of code examples that you can copy into your text editor and see in reality. How can it be possible to display tags that the browser should interpret and display as text? To do this, all the angle brackets are "coded" using entities: the left corner bracket (less than sign <) — as <, the right angle bracket (greater than sign >) — as >.
After that, the code is displayed as plain text.
Figure 17
Figure 18
In the example in the file entities.html, you can see that the html code has ceased to be the same in a text editor due to replacing bracket with entities, but looks like the html code on the page.
Figure 19
In order to save the appearance of text lines in the editor and on the page, you must set the following css-properties for the <code> tag:
code { white-space: pre; }
And we see that all the spaces, tabs, and line breaks are displayed on the html page (Fig. 20).
Figure 20
Also, such angle brackets are converted into entities in CMS (content management systems) when trying to insert them into the text of the article. And in the server programming language PHP (Hypertext Preprocessor), which is used to create and edit a set of web sites, as well as to process these forms, for example, there is a special function for converting angle brackets into entities to prevent the insertion of malicious code. However, we will talk about this in other courses.
Sometimes you might need short (– or –) and long dashes (— and —). Probably, characters of quotes, arrows or currencies will be useful. They can either be copied from any text editor (for example, Microsoft Word) or added as an HTML entity.
All other entities are used quite rarely. Usually these are narrowly targeted sites where physical or chemical formulas should be used, and so on.
CSS properties for text formatting
Today, we'll look at the properties that are used to break lines, to form intervals between characters or words, and to design a text.
1. The white-space property is responsible for displaying the spaces between words. By default, the value normal sets automatic word wraps when the right edge of the browser or the parent block is reached.
white-space: normal | nowrap | pre | pre-line | pre-wrap
- The nowrap value does not take into account the spaces and line breaks in the HTML code, and all text is displayed as one line. In this case, a horizontal scroll bar can be added if the text does not fit the width of the parent container. Only adding the tag <br> moves the text to a new line.
Figure 21
- The pre value displays the text with all the spaces, tabs, and hyphenations that are present in the HTML code. For a too long line a horizontal scroll bar will be added in the browser.
- The pre-line value does not take into account the spaces, and the text itself will be moved to the next line if it does not fit into the parent container.
- The pre-wrap value saves all the spaces and hyphens that were in the text editor, but if the line does not fit the width of the parent container, the text will automatically be moved to the next line.
Emmet: wsn, wsp, wsnw, wspl, wspw.
2. The word-break property is responsible for breaking lines inside words that do not fit the intra-parent container in width. Has the following values:
word-break: normal | break-all | keep-all
- The normal value (default) hyphenates the whole word when it reaches the right edge of the parent block.
- The break-all value adds line breaking so that the word fits into the given width of the parent block (cannot be used for text in Chinese, Korean or Japanese).
Figure 22
- The keep-all value prevents the line breaking for text in Chinese, Korean or Japanese. For other languages, it corresponds to normal.
Emmet: wbn, wbba, wb:ka.
3. The word-wrap property specifies whether to wrap words that do not fit within the parent container in width. Has the following values:
word-wrap: normal | break-word
- The normal value (default) wraps the whole word when it reaches the right edge of the parent block or in places where the <br> tag is added inside the text.
- The break-word value adds the line breaking so that the word fits into the specified width of the parent block.
Figure 23
Emmet: wwn, wwb.
4. One of the new properties added to the CSS3 specification is writing-mode, which defines the direction of text on the page — horizontal or vertical. The default value is horizontal-tb:
writing-mode: horizontal-tb | vertical-rl | vertical-lr
- The value horizontal-tb determines the direction of the text horizontally from top to bottom and from left to right.
- The value vertical-rl determines the direction of the text vertically from top to bottom and from right to left.
- The value vertical-lr determines the direction of the text vertically from top to bottom and from left to right.
Applies to all elements except for cells and table rows
Figure 24
Emmet: wm.
5. The word-spacing property specifies the spacing between words. By default, the normal value sets the normal interval corresponding to the font parameters. You can also specify the size in any units except %:
word-spacing: number in px, pt, em, but not in %| normal
Emmet: wos.
Negative values that reduce the distance between words are also supported.
In the case where the text-align: justify property is set for the text, the spacing between words will be set forcibly, but the value will not be less than that specified in word-spacing.
Figure 25
6. The letter-spacing property allows you to set the interval between the characters for the selector. By default, this distance is determined depending on the font. You can specify negative values, if necessary, and id this does not make the appearance ugly.
letter-spacing: number in px, pt, em, but not in %| normal
Emmet: ltsn, lts:number.
.lettering {letter-spacing: 6px;}
Figure 26
7. The text-transform property allows you to change the case of text by converting characters to uppercase or lowercase. The value none leaves the text unchanged, and capitalize converts the first letters of each word to capital (more peculiar for the English language).
text-transform: capitalize | lowercase | uppercase | none
Figure 27
Emmet: ttn, ttc, ttl, ttu.
8. The text-decoration property adds text underlining (underline value), strikethrough (line-through value) or overlining (overline value). The value of none leaves the text unchanged.
text-transform: underline | line-through | overline | none
Figure 28
The text-decoration property consists of 3 properties that can be set individually or together in the integrated property text-decoration:
text-decoration: text-decoration-line | text-decoration-style | text-decoration-color
The text-decoration-line property defines the line type and has values as in the main property:
text-decoration-line: line-through | overline | underline | none
You can add multiple lines by listing values separated by commas.
The line type is defined by the text-decoration-style property:
text-decoration-style: solid | double | dotted | dashed | wavy
The values of this property are as follows:
- solid — solid single line;
- double — double line;
- dotted — dotted line;
- dashed — dashed line;
- wavy — wavy line.
The text-decoration-color property specifies the color of the line. Its values can be set in the same way as for the color property.
Figure 29
Emmet: tdn, tdlt, tdo, tdu, tdn.
9. The text-shadow property allows you to add a shadow to the text. The values are: the horizontal offset (offX), the vertical offset (offY), the blur radius (blur), and the color (color).
By default, any text has no shadow (none).
text-shadow: none | offXoffY blur color
Figure 30
You can add multiple shadows by listing their parameters with a comma.
text-shadow: offX1 offY1 blur1 color1, offX2 offY2 blur2 color2, offX3 offY3 blur3 color3;
Figure 31
Emmet: ts.
You can look at the css-properties in the file css-properties.html.
Home Assignment
In the homework you will need to format the text of the article (use the <article> tag) using different tags and css-styles. Note that the font on the page is sans serif, and the size of the article is reduced in comparison with the size of the browser. At the moment we will not consider in detail how to do this, but simply use the following css properties:
article { width: 90%; margin: auto; }
This will reduce the width of the article and center it in the browser.
Note that in the text there is often the bold style, including the color emphasis (red, yellow, green, etc.). Think about which inline elements can be used for html-markup and do not forget to specify different classes for color characteristics.
You can find the task in the folder HW2. There is a text file and an image with the final appearance of the article.
© STEP IT Academy, itstep.org
All the copyrighted photos, audio, and video works, fragments of which are used in the material, are the property of their respective owners. Fragments of the works are used for illustrative purposes to the extent justified by the objective, within the educational process, and for educational purposes, in accordance with the Act of “On Copyright and Related Rights”. The scope and method of the cited works are in accordance with the adopted norms, without prejudice to the normal exploitation of copyright, and do not prejudice the legitimate interests of authors and right holders. At the time of use, the cited works fragments cannot be replaced by alternative, non-copyrighted counterparts and meet the criteria for fair use. All rights reserved. Any reproduction of the materials or its part is prohibited. Use of the works or their fragments must be agreed upon with authors and rights holders. Agreed material use is only possible with reference to the source. Responsibility for unauthorized copying and commercial use of the material is determined by the current legislation.