2015年11月28日 星期六

小夜子 / 初音ミク

小夜子
歌手:初音ミク
作詞:ミキト
作曲:ミキト
編曲:ミキト
by 鹿乃

冷蔵庫の中には何にも無い 只あるのはお茶とお薬
reizouko no nakani wa nanimo nai tada arunowa o cha to okusuri

一錠ごとに胸がふわふわ 不安が満ちてく
ichi jou gotoni mune ga fuwafuwa fuan ga michi teku


iphone 撫でるその指先も べたべたと粘る髪の毛も
iphone nade rusono yubisaki mo betabetato nebaru kaminoke mo

何一つ綺麗なもんなんて 有る筈も無いな
nani hitotsu kirei namonnante aru hazumo nai na


死にたくて 死にたくて そっと
shini takute shini takute sotto

間違って 傷をつけた手首は
machigatte kizu wotsuketa tekubi wa

いつしか 茶色く汚れてる
itsushika chairo ku yogore teru

締め切ったボクの瞼
shimekitta boku no mabuta

カーテンの隙間に朝が來ても
ka ten no sukima ni asa ga kite mo

気付く筈無い
kizuku hazu nai


友達のエリもタカユキも 本當のトコ 他人のコトなど
tomodachi no eri mo takayuki mo hontou no toko hito no koto nado

気にしてる暇も無いくらい 忙しそうだしな
kini shiteru hima mo nai kurai isogashi soudashina

それにしても何この笑窪 ありがちな家族と人生
sorenishitemo naniko no ekubo arigachina kazoku to jinsei

何一つ誇れるもんなんて 有る筈も無いな
nani hitotsu hokore rumon nante aru hazu mo nai na


眠たくて 眠たくて ずっと
nemuta kute nemuta kute zutto

このまんま痺れるほど眠ったら
konomanma shibire ruhodo nemutta ra

起きて リンゴ齧って眠る
oki te ringo kajitte nemuru

無意識 裝って ゆらり
muishiki yosootte yurari

べランダに登って風が吹いても
be randa ni nobotte kaze ga fui temo

飛べる筈無い
tobe ru hazu nai


あんなに好きなお笑いも
annani suki nao warai mo

人生変えた音楽でさえ
jinsei kae ta ongaku desae

何故に僕の事を否定するの
naze ni boku no koto wo hitei suruno


死にたくて 死にたくて そっと
shini takute shini takute sotto

間違って 傷をつけた手首は
machigatte kizu wotsuketa tekubi wa

いつしか茶色く汚れてる
itsushika chairo ku yogore teru

締め切ったボクの瞼
shimekitta boku no mabuta

カーテンの隙間に朝が來ても
ka ten no sukima ni asa ga kite mo

キヅカナイヨ
kizukanaiyo


そんな日が そんな日が ずっと続くんやって嘆いても
sonna hi ga sonna hi ga zutto tsuzuku nyatte nagei temo

何かが 愈えるわけじゃ無い
nanika ga ie ruwakeja nai

愈える筈無い
ie ru hazu nai

キエテシマオウ
kieteshimaou

うん、消えてしまおう
un, kie teshimaou

2015年11月14日 星期六

異常流行幻象與群眾瘋狂 Extraordinary Popular Delusions and The Madness of Crowds

異常流行幻象與群眾瘋狂
Extraordinary Popular Delusions and The Madness of Crowds
Charles Mackay──著
李祐寧──譯
大牌──出版

via 博客來
「瘋狂對於個體來說是罕見的——但對於群體、政黨、民族及時代的瘋狂,它是規則。」
“Madness is something rare in individuals — but in groups, parties, peoples, and ages, it is the rule.”

──Friedrich Nietzsche, Beyond Good and Evil

每個時代都有其特殊的愚行,有些出於貪婪、有些源自妄想,要不就是因為政治或宗教的狂熱。最顯而易見的錯誤,只有最冷漠的人群才能看到。無論哪一個時代,​不​論是西方還是東方,人類群體中總會間歇性地出現某種癲狂情緒,它們或者發生在一場莫名其妙的運動中,或者發生在金融證券和商業市場上。書中描寫的種種狂熱、欲望、瘋狂,不僅限於那個時代的人們,它更揭示了人性中的瘋狂基因──從眾與貪婪。

人類愚行史

檢視各國歷史,我們可以發現國家與人一樣,也具備特有的怪脾氣與個性,當進入興奮與魯莽的週期時,他們經常對自己的所作所為毫不在意。整個社會在一瞬間,將全體意識凝聚到單一目標上,不顧一切地追逐,數以百萬的人同時對一個幻象深感瘋狂,誓死追求,直到另一個嶄新、愚蠢卻更顯迷人的事物攫取他們的注意力。一個國家可以突然間,舉國上下瘋狂追求軍事榮耀;又可能在突然間,沉淪在某種宗教狂熱中,直至血流成河、哀鴻片野,這才逐漸恢復理智,而後代卻要為此無辜受累。早期歐洲的年鑑曾記錄群眾為耶穌墓穴著迷,大批的人瘋狂湧入聖地;另一時期,又深陷在對惡魔的恐懼中,導致數十萬名受難者因巫術之名而失去性命。還有一時,眾人為賢者之石(傳說可將一般的非貴重金屬變成黃金) 的議題著迷,變成全然的傻瓜直至這陣熱潮退去。更有那麼一段時間,歐洲多數國家認為慢性毒殺敵人是一種可寬恕的罪行。那些反對將利刃刺進他人心臟之人,卻對在濃湯裡下藥的行為毫無愧疚。如此謀殺的行徑在出生良好、舉止端莊的淑女間蔓延,並在她們的推波助瀾下成為一種風潮。某些令人髮指的幻象,卻在那些文明且高尚的國家內,以如同誕下這些陋習的野蠻國度般,蓬勃發展且留存長久,舉例來說,決鬥就是個例子,而人們對預兆與占卜之術的信仰,進一步阻礙了知識的進步,讓這些陋習無法在大眾的觀念中根除。金錢,經常成為引發群眾幻象的根源。那些清醒的國家都曾淪為貪得無厭的賭徒,用一張紙的去向來冒存亡之險。追溯那些最引人注目的幻象,則是本書的目標。俗話說得好,人,和動物一樣,總是集體陷入瘋狂,再慢慢地、一個接一個地恢復理智。

群體幻象出現得如此之早,擴散得如此之快,存活得如此之久,當前的版本或許該被視為幻象的雜集而非史記,或是那本超豐富且讓人驚駭的人類愚行史記的其中一章(而此作品還未誕生),波森(Porson)曾打趣說道,如果讓他來寫,非要寫上五百卷不可。本書中間穿插著一些較輕微的事件,描述那些關於人們爭相仿效或秉持錯誤信念的有趣事例,而非全部聚焦在愚行與幻象上。

2015年7月4日 星期六

這才是數學 Measurement

這才是數學
從不知道到想知道的探索之旅
Measurement
Paul Lockhart──著
畢馨云──譯
經濟新潮社──出版
via pansci.tw

「數學定律越和現實有關,它們越不確定;若它們越是確定的話,它們和現實越不會有關。」
“How can it be that mathematics, being after all a product of human thought which is independent of experience, is so admirably appropriate to the objects of reality?”

──Albert Einstein

現實世界有很多種。其中一種是我們身處的實體世界。再來就是那些和實體世界非常類似的想像世界,例如那個「一切如常而且我五年級那年並沒有尿在褲子上」的世界,或是「同車黑髮正妹轉頭跟我交談,最後我們墜入愛河」的那個世界。
但我想談一種完全不同的世界,我稱呼它為「數學實在」(mathematical reality)。在這世界裡,美麗的幾何形狀與模式翱翔其間,做出讓我驚嘆的有趣行徑。這個世界很棒,我真的很喜歡。
問題是,實體世界是個災難。它太複雜,一切事物都不像表面上那樣。物體熱脹冷縮,原子不時飛舞。沒有任何東西可以真正測量得出來。我們不知道一根草的精確長度。這個宇宙中的任何一個量測值,必定都是近似值。宇宙的本質就是如此。在這裡,最小的斑點不是點,最細的絲線也不是線。
至於數學實在,則是一種假想的世界,我可以隨自己高興,把它想像成簡單又美好。我可以擁有現實生活裡不可能擁有的完美事物。我手裡不可能握著一個圓,但腦袋裡可以,甚至還能度量這個圓。數學實在,是我自創出來的美麗境地,我可以去探訪、思索,與朋友一起討論。
大家之所以對實體世界感興趣,有很多理由。天文學、生物學、化學和其他領域的專家,都不斷在設法了解這宇宙的運作,試圖描述它。
我則想描述數學實在,想做出模式,搞清楚這些模式如何運作。這正是像我這樣的數學家努力做的事。

2015年6月27日 星期六

聖母峰之死 Into Thin Air

聖母峰之死
Into Thin Air
Jon Krakauer──著
宋碧雲、林曉欽──譯
大家出版社──出版
via cite.com
「人類演出悲劇,是因為他們不相信現實中的悲劇。但悲劇,其實都發生在文明世界裡。」
“Men play at tragedy because they do not believe in the reality of the tragedy which is actually being staged in the civilised world.”
──José Ortega y Gasset 
via wiki
 It would seem almost as though there were a cordon drawn round the upper part of these great peaks beyond which no man may go. The truth of course lies in the fact that, at altitudes of 25,000 feet and beyond, the affects of low atmospheric pressure on the human body are so severe that really difficult mountaineering is impossible and the consequences even of a mild storm may be deadly, that nothing but the most perfect combinations of weather and snow offers the slightest chance of success, and that on the last lap of the climb no party is in position to choose its day...
 No, it is not remarkable that Everest did not yield to the first few attempts; indeed it would have been very surprising and not a little sad if it had, for that is not the way of great mountains. Perhaps we had become a little arrogant with our fine new technique of ice-claw and rubber slipper, our age of easy mechanical conquest. We had forgotten that the mountain still holds the master card, that it will grant success only in its own good time. Why else does mountaineering retain its deep fascination?
Eric Shipton, in 1938, 
Upon that Mountain
  這些巍峨高峰的上半部彷彿劃出一條警戒線,誰也越不過。癥結在於,到了海拔七千六百公尺以上,低氣壓對人體的影響極為劇烈,根本不可能進行真正艱困的登山活動,一場輕微的暴風雪就可能帶來致命的後果,唯有最完美的氣候和積雪情況能提供些微的成功機會,而在攀登的最後階段誰也不可能挑日子……
  不,聖母峰在一開始沒讓人輕易得逞,這並不足為怪。說真的,聖母峰若輕易投降才叫人吃驚,而且將非常可悲,有失大山風範。也許我們有了冰爪和橡皮便鞋等優良新科技,有了長年以機械輕鬆征服萬物的歷史,變得有些傲慢了。我們忘記高山仍握有王牌,只在自己覺得恰當的時機頒出成功的獎牌。否則登山怎麼會深深蠱惑人心呢?
席普頓,一九三八年
《那座山上》

2015年6月25日 星期四

Coursera R Programming Week4 心得筆記

R 語言程序開發
約翰霍普金斯大學 公共衛生學院
R Programming
Johns Hopkins Bloomberg School of Public Health

Week 4
第四週學習筆記


2015年6月1日 - 2015年6月29日

Week 4: Simulation and Profiling

This week covers how to simulate data in R, which serves as the basis for doing simulation studies. We also cover the profiler in R which lets you collect detailed information on how your R functions are running and to identify bottlenecks that can be addressed. The profiler is a key tool in helping you optimize your programs. Finally, we cover the str function, which I personally believe is the most useful function in R.

Learning Objectives

By the end of this week you should be able to:
  • Call the str function on an arbitrary R object
  • Describe the difference between the "by.self" and "by.total" output produced by the R profiler
  • Simulate a random normal variable with an arbitrary mean and standard deviation
  • Simulate data from a normal linear model

Programming

There is a graded programming assignment for this week.
  • Programming assignment 3: Hospital Quality

2015年6月20日 星期六

Coursera R Programming Week3 心得筆記

R 語言程序開發
約翰霍普金斯大學 公共衛生學院
R Programming
Johns Hopkins Bloomberg School of Public Health

Week 3
第三週學習筆記


2015年6月1日 - 2015年6月29日

R Programming: Week 3

We have now entered the third week of R Programming which also marks the halfway point. The lectures this week cover loop functions and the debugging tools in R. These aspects of R make R useful for both interactive work and writing longer code, and so they are commonly used in practice.

The Programming Assignment is challenging and so I encourage you to start early if you have the chance. It requires you to explore some of the more interesting aspects of the R language, including taking advantage of the scoping rules to implement state preservation in R objects.

Note that the programming assignment this week is implemented as a Peer Assessment so you will not see it listed with the other Programming Assignments. Please go to the Programming Assignment 2 section of the course to find the assignment instructions. Also, for this assignment, you will need to setup your GitHub account if you have not yet done so.

Best of luck!
Roger Peng and the Data Science Team
Mon 15 Jun 2015 8:01 AM CST

Week 3: Loop Functions and Debugging

This week is what I call "loop functions" in R, which are functions that allow you to execute loop-like behavior in a compact form. These functions typically have the word "apply" in them and are particularly convenient when you need to execute a loop on the command line when using R interactively. These functions are some of the more interesting functions of the R language. This week we also cover the debugger that comes with R and how to interpret its output to help you find problems in your programs and functions.

Learning Objectives

By the end of this week you should be able to:
  • Define an anonymous function and describe its use in loop functions [see lapply]
  • Describe how to start the R debugger for an arbitrary R function
  • Describe what the traceback() function does and what is the function call stack

Programming

There is a graded programming assignment for this week. Please note that this assignment is graded via peer assessment.
  • Programming assignment 2: Lexical Scoping

2015年6月17日 星期三

Coursera The Data Scientist’s Toolbox Week3 心得筆記

數據科學家的工具箱
約翰霍普金斯大學 公共衛生學院
The Data Scientist’s Toolbox
Johns Hopkins Bloomberg School of Public Health

Week 3
第三週學習筆記

2015年6月1日 - 2015年6月29日

Video Lectures
Week 3 (34:38)

Types of Data Science Questions (9:09)

我們大致按照實際達到分析目標的難度來排序下列幾個問題:
  1. Descriptive 描述性分析
  2. Exploratory 探索性分析
  3. Inferential 推斷分析
  4. Predictive 預測分析
  5. Causal 因果分析
  6. Mechanistic 機理分析

描述性分析

這種分析的目標僅是描述一組數據。在這種描述的基礎上,你不需要做任何決定或類似的事情,數據的描述和解釋是兩個不同的步驟。
在沒有額外的統計建模的基礎上,這些描述通常沒什麼普遍性,你所描述的只是你在這組數據中看到的情況,但你不能說,下一個人得到的情況也會是怎樣。
描述性分析也是人口普查時最常用的分析類型。

United States Census 2010

Google Ngram Viewer

這些例子都只是單純地描述發生了什麼,你不能用它們來預測!

探索性分析

在此分析類型中,你試著去觀察數據,並發現之前未知的關係,但不一定要確認這種關係,因此這種分析有利於發現新的關聯,同時有助於確定今後的數據科學項目,在其中你所做的其實就是試圖證實你所進行的探索。
對於任一實際問題,探索性分析通常都不具最終發言權,通常也不應被用於歸納或預測。
重要的一點,你可能也聽說過的是:「相關性並不表示因果關係。」

Liu et al. (2012) Scientific Reports

The Sloan Digital Sky Survey

推斷分析

推斷分析的目標是在少量觀察的基礎上,根據一小部分的數據將得到的訊息歸納、外推到更大的群體,你聽過的絕大部分的統計模型和數據都是用於推斷分析。它包括了估計你感興趣的東西的數量以及估計數量的不確定性,這在很大程度上依賴於你所觀察的母體以及你使用的抽樣方法。

Correia et al. (2013) Epidemiology(美國空氣汙染控制和平均壽命的關係)

預測分析

預測分析是利用從某些對象收集到的數據,去預測下次觀察可能碰到的另一個對象的值。
重要的一點是,即使通過x預測到了y,也不能說是x導致了y。
精準的預測很大程度上依賴於測量正確的變量,儘管預測模型有好有壞,可以肯定的是數據越多且模型越簡單,預測效果往往就越好。

「預測是很困難的,尤其是對未來的預測。」


http://fivethirtyeight.blogs.nytimes.com/(Nate Silver在他的部落格Five Thirty Eight上預測美國大選)


How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did


因果分析

因果分析在於了解:如果改變一個變量的值會發生什麼?這會對另一個變量的值造成什麼改變?
一般來講,實施因果分析的權威標準是:利用隨機研究或隨機對照試驗來確認因果關係。你可以透過觀察保存在數據庫中的數據來進行研究,但這難以說服別人。你必須對你的模型運行的方式,給出更有力的假設。人們通常認為因果關係是一種平均的效果,換句話說,如果給一個群體一種特定的藥,那麼他們平均會比沒有服用藥的群體活的更久一些。對大部分目標是得到變量之間的因果關係的應用來說,這通常是數據分析的權威標準。

van Nood et al. (2013) NEJM

機理分析

極少有分析是以機理分析為目標的,機理分析是要去理解變量的精確變化和導致了其他變量精確變化的變量的過程。如果數據噪音不斷,會是這種分析更加困難。
機理分析最常應用的範圍,是在物理或工程學領域,利用一些簡單的模型,就可以描述許多動作。
一般來講,在進行分析時,相對於數據中的其他變化,測量誤差是唯一的隨機因素。

http://www.fhwa.dot.gov/resourcecenter/teams/pavement/pave_3pdg.pdf
上面這例子中,我們想要了解道路設計的不同和變化能直接導致道路功能發生什麼樣的變化。

What is Data? (5:15)
Data is a set of values of qualitative or quantitative variables.
首先我們需要一組對象,一組你將要進行測量的材料的集合,有時這種集合在統計推斷中也被稱為母體(population)。變量(variable)是指對象的測量指標或特徵,它可以是你測量出的一個人的身高或測量出的某人在某個網站停留的時間,另外,它也可以是定性的,比如它可以是此人在此網站瀏覽過的地方或你覺得訪問者的性別。
  • 定性變量(Qualitative)是諸如原產國、性別或治療方法之類的,它們不一定有序,也不一定是測量值。
  • 定量變量(Quantitative)通常是連續的,如身高、體重和血壓等,它們在特定範圍裡是有序的。

http://brianknaus.com/software/srtoolbox/s_4_1_sequence80.txt

https://dev.twitter.com/rest/reference/get/blocks/list

http://bluebuttontoolkit.healthit.gov/challenge/

How Many Computers to Identify a Cat? 16,000

Darwintunes
http://www.pnas.org/content/109/30/12081.full
https://soundcloud.com/uncoolbob/sets/darwintunes

http://www.data.gov/

數據應該與你回答的問題保持一致。通常數據會限制或幫助你回答問題,也就是說,當你提出一個問題,但你可能沒有能回答這問題的數據,於是你必須要調整你的問題,將它變為一系列可回答的子問題或相關問題。總而言之,如果你無法提出一個問題,僅擁有數據是寸步難行的。

先提出一個問題,然後利用手中的數據去分析得到答案。而不是擁有數據,才去發現問題。

What About Big Data? (4:15)



http://mashable.com/2011/06/28/data-infographic/

隨著時間的推移、科技的發展,大數據的概念也在發生變化。

因此,解決大數據問題的途徑之一,就是等到硬體的發展速度能跟上數據增長的速度的時候。
Six Degrees of Separation
Travers and Milgram (1969) Sociometry

Stanley Milgram做了一個實驗,他選取了296個實驗對象,向他們每個人寄了一封信,並要求他們將信轉寄給一個他們認識的人,依次傳遞下去,直到信件到達一個特定的地址。其中有64條這樣的傳遞鏈最終傳回了目的地,也就是296封信中收到了64封。他們從實驗中發現,在開始拿到這封信的人和最終皆受到信的人之間,在傳遞鏈上大約間隔5.2人。


Leskovec and Horvitz WWW '08


Don't use Hadoop - your data isn't that big
“The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”
──John Tukey
“... no matter how big the data are.”
──Leek
Experimental Design (15:59)

http://www.nature.com/nm/journal/v12/n11/full/nm1491.html

http://arxiv.org/pdf/1010.1092.pdf

在進行任何實驗設計或是資料科學工程時,首先需要意識到的就是關心分析計畫。在研究的設計和分析中,很重要的是,你需要對各方面都加以關注,從資料清理到資料分析,再到形成報告,這樣你才不會陷入愚蠢尷尬的情境。更重要的是,你要意識到在研究設計中,那些使你犯錯誤的關鍵。

https://nsaunders.wordpress.com/2012/07/23/we-really-dont-care-what-statistical-method-you-used/

無論你進行什麼研究,你都需要有方案來共享你的資料或代碼:
當你實際進行實驗前,第一件要做的事情就是提前構思好你的問題。

http://www.wired.com/2012/04/ff_abtesting

http://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture2.pdf

Chocolate Consumption, Cognitive Function, and Nobel Laureates
http://www.nejm.org/doi/full/10.1056/NEJMon1211064

上述這種例子的關係有時被稱為偽相關(Spurious Correlation)。

有幾種方法可以處理這些混雜因素:
  • 第一種方法是你可以固定一部份的變量,人們知道你固定了那個變量,所以它不可能是混雜因素。
  • 另一種方式是將變量分層。
  • 如果上述兩種方法都不能的話,你可以對它進行隨機化(randomize)
隨機化是指:你需要利用一個計算機程序或是拋硬幣的方式,將實驗對象分配到不同的組中。
http://www.gs.washington.edu/academics/courses/akey/56008/lecture/lecture1.pdf

http://www.biostat.jhsph.edu/~iruczins/teaching/140.615/


http://xkcd.com/882/
  • 好的實驗設計應包括:
    • 重複試驗,這樣你可以測量資料的變化程度。
    • 把測得的變化程度和人們關注的信號比較。
    • 將它們推廣到你關心的問題上。
    • 代碼和資料的透明性。
  • 預測不同於推斷,兩者都很重要,使用哪種方法取決於你的具體情況。
  • 在任何資料科學問題中,都需意識到資料探勘(data dredging)的問題。

2015年6月10日 星期三

Coursera R Programming Week2 心得筆記

R 語言程序開發
約翰霍普金斯大學 公共衛生學院
R Programming
Johns Hopkins Bloomberg School of Public Health

Week 2
第二週學習筆記

2015年6月1日 - 2015年6月29日

R Programming: Week 2

Today marks the beginning of Week 2 of R Programming. This week we take the gloves off and the lectures cover key topics like control structures and functions. We also introduce the first programming assignment for the course, which is due at the end of the week.

A few notes about the Programming Assignment:

Each part of the Assignment can be submitted an infinite number of times---there is no limit on the number of submissions.
For each part, we take your maximum score over all of your submissions.
There is a submission script that you will have to download to submit your assignment.
Roger Peng and the Data Science Team
Mon 8 Jun 2015 8:01 AM CST

Week 2: Programming with R

This week is all about functions and about controlling the flow of an R program. We start with control structures (like if-else, and for loops) and then move on to writing functions. Next, we discuss the lexical scoping features of the language and how they can be used in interesting ways, particularly for statistical applications.

Learning Objectives

By the end of this week you should be able to:
  • Write an if-else expression
  • Write a for loop, a while loop, and a repeat loop
  • Define a function in R and specify its return value [see Functions Part 1 and Part 2]
  • Describe how R binds a value to a symbol via the search list
  • Define what lexical scoping is with respect to how the value of free variables are resolved in R
  • Describe the difference between lexical scoping and dynamic scoping rules
  • Convert a character string representing a date/time into an R datetime object. [see Dates and Times]

Programming

There is a graded programming assignment for this week.
  • Programming assignment 1: Air Pollution
For those interested in a bit of a "warm up" to this programming assignment, Derek Franks has written a very nice tutorial to help you get up to speed.


This programming assignment has multiple parts and you will submit your answers using the submit script described in the instructions.

2015年6月7日 星期日

Coursera The Data Scientist’s Toolbox Week2 心得筆記

數據科學家的工具箱
約翰霍普金斯大學 公共衛生學院
The Data Scientist’s Toolbox
Johns Hopkins Bloomberg School of Public Health

Week 2
第二週學習筆記

2015年6月1日 - 2015年6月29日

Video Lectures
Week 2 (50:50)

Tips from Coursera Users - Optional Video (3:53)

Command Line Interface (16:04)

Windows: Git Bash (除非熟悉Git,不然安裝的選項一律默認就好。Git Bash只有在Windows系統上可行!)
Mac/Linux: Terminal

/ (root):根目錄
~ (home):主目錄

CLI Commands
  • command 命令
  • flags 參數選項
  • arguments 執行對象

Summary of Commands
  • pwd (print working directory):輸出目前的工作目錄
  • clear:清除視窗
  • ls (list):列出所在工作目錄中的所有子目錄
  • ls -a:列出所有隱藏和未隱藏的文件夾
  • ls -al:顯示這個文件夾的詳細訊息
  • cd "目錄位址" (change directory):更改工作目錄,沒有輸入argument會預設主目錄
  • cd .. :進入上一級目錄
  • mkdir "目錄名稱" (make directory):創建目錄
  • touch "文件名稱":創建一個空文件
  • cp "文件名稱" "複製到哪個目錄" (copy):複製文件
  • cp -r "目錄名稱" "複製到哪個目錄" (copy):複製目錄
  • rm "文件名稱" (remove):刪除文件
  • rm -r "目錄名稱" (remove)刪除目錄
  • mv "文件名稱" "移動到哪個目錄" (move):移動文件
  • mv "文件名稱" "文件新名稱" (rename):更改文件名稱
  • echo "輸出值":列印出你的參數值
$ echo Hello World!
Hello World!
  • date:列印日期
$ date
Sun Jun  7 13:00:00     2015

Introduction to Git (4:49)

Version Control


版本控制系統是按時間記錄你對某個、某組文件所做的修改,方便你找回過去某個特定的版本。

Git是一個免費開源的版本控制系統,也是目前最流行、應用最廣泛的版本控制系統之一。

https://git-scm.com/book/en/v2/Getting-Started-A-Short-History-of-Git

$ git config --global user.name "Your Name Here"
$ git config --global user.email "your_email@example.com"
這個部分只需要進行一次,但你可以隨時更改。

$ git config --list  ## 可以看見你的用戶名、信箱以及其他訊息

$ exit  ## 退出Git Bash


Introduction to Github (3:53)

Git = Local (on your computer); GitHub = Remote (on the web)

GitHub是一個提供與軟體開發的網路主機服務,其使用Git版本控制系統作為核心。它能讓你在線開發項目,也能將項目上傳至網路上供其他人查閱和開發。

也就是說,它允許用戶對本地資源庫執行推送和拉取,對處於Git管理下的本地資源庫,你可以將它們推送至網路上的遠程資源庫中,或從遠程資源庫拉取回來。它同時提供每一個用戶一個主頁,當中列出了該用戶所有的資源庫。這些GitHub上的資源庫也會備份在服務器上,以防止你的本地庫發生意外。

但GitHub最主要的核心是在於它的社交功能,它允許用戶互相關注、分享及開發各自項目。

※ GitHub的帳號信箱要跟Coursera的信箱一樣

Creating a Github Repository (5:51)

創建資源庫(repo)
  • 創建一個全新的資源庫
  1. https://github.com/new
  2. 或在個人主頁(https://github.com/yourUserNameHere)的右上角建立。(如下圖)

※ 免費帳戶一律只能創建公開(Public)的資源庫。
 記得勾選"Initialize this repository with a README"的選擇框。

現在你可以在本地點腦上創建一個備份,先打開Git Bash,然後創建一個用來存放資源庫備份的文件夾


  • 創建一個基於其他用戶的資源庫的分叉(Fork)
建立分叉可以讓你和其他人合作開發軟體,它會在你的個人主頁裡創建一個該資源庫的備份。


git clone http://github.com/yourUserNameHere/repoNameHere.git

這個命令可以讓你獲得遠程服務器上的資源庫版本,它會複製在你當前的工作目錄下。

https://help.github.com/articles/fork-a-repo/
https://git-scm.com/book/it/v2/Git-Basics-Getting-a-Git-Repository

Basic Git Commands (5:52)

http://gitready.com/beginner/2009/01/21/pushing-and-pulling.html

git add .:把所有新文件添加到你現在的工作目錄
git add -u:更新那些被改名或被刪除的文件
git add -A:包含上述兩個命令
git commit -m "message":提交,注釋最好是關於此次更改的描述,這僅僅是本地的操作,不會更新到GitHub
git push:推送到GitHub

Fork及Branch的差別(http://wp.chunhsin.idv.tw/?p=4179
  1. Fork會另外複製一個版本,這個版本也是一個完整的套件。
  2. 官方說明文字裡指出,Fork主要是指要以其他人的套件為初始套件來開發時,或者要替他人的套件做出貢獻,也就是說通常是從其他Git帳號所擁有的套件複製而來的就是Fork。
  3. 如果是自己的套件,正確的作法應使用branch
  4. Fork底下還可以有Branch,但沒有Branch底下還有Fork這種狀況。
  5. 無論是Fork還是Branch的版本都可以合併至主要版本。唯一差別是Fork是向原作者送出merge的要求,尚需要原作者允許才可以合併,而branch因為是從自己的帳號分支出來的套件,所以不須另外允許。
git checkout -b "branchname":創建一個分支(Branch)
git branch:查看分支
git checkout master:切換回主分支

合併分叉(Fork)或分支(Branch),這功能只有在GitHub才有



如果是與別人合併,那他們會收到通知,如果他們同意修改,就會將你的請求整合到他們的資源庫。

Basic Markdown (2:22)

Markdown(.md)是一種以簡單、特定格式寫成的文件。GitHub、R及Rstudio都可以識別此格式。

Heading

## This is a secondary heading  // 第二級標題
### This is a tertiary heading  // 第三級標題

* first item in list  // 未排序列表第一項
* second item in list // 未排序列表第二項
* third item in list  // 未排序列表第三項

Getting markdown help

Installing R Packages (5:37)

http://cran.r-project.org/mirrors.html

http://www.bioconductor.org/(生物學及大型數據)

> a <- available.packages()
> head(rownames(a), 3)  ## Shoe the names of the first few packages
[1] "A3"          "abc"         "ABCanalysis"

http://cran.r-project.org/web/views/

> install.packages("slidify")  ## Installing an R Package

> install.packages(c("slidify, "ggplot2", "devtools"))

> source("http://bioconductor.org/biocLite.R")> biocLite()
> biocLite(c("GenomicFeatures", "AnnotationDbi"))
library():告訴R要載入哪個套件

> library(ggplot2)
> search()  ## 可以看見組成ggplot2的所有函數
 [1] ".GlobalEnv"        "package:ggplot2"   "tools:rstudio"    
 [4] "package:stats"     "package:graphics"  "package:grDevices"
 [7] "package:utils"     "package:datasets"  "package:methods"  
[10] "Autoloads"         "package:base" 

Installing Rtools (2:29)

這一節僅針對Windows用戶。

Rtools是在Windows下建構R套件時必備的一系列工具。
> find.package("devtools")
> install.packages("devtools")

> library(devtools)

然後輸入find_rtools(),應該返回一個TRUE。

2015年6月4日 星期四

Coursera R Programming Week1 心得筆記

R 語言程序開發
約翰霍普金斯大學 公共衛生學院
R Programming
Johns Hopkins Bloomberg School of Public Health

Week 1
課程介紹、第一週學習筆記

2015年6月1日 - 2015年6月29日

了解如何使用R進行編程以及如何使用R進行有效的數據分析。本課程是約翰霍普金斯(Johns Hopkins)數據科學專項課程的第二門課程。

課程類型
Information, Tech & Design
Statistics and Data Analysis

教授

Roger D. Peng, PhD - Johns Hopkins University
Jeff Leek, PhD - Johns Hopkins University
Brian Caffo, PhD - Johns Hopkins University

課程簡介
課程長度:4 weeks、7-9小時/週
語言:English
字幕:English, Español & 中文

課程概述
在本課程中,你將了解如何使用R進行編程以及如何使用R進行有效的數據分析。你將了解如何安裝和配置統計編程環境所需的軟件並說明通用的編程語言概念,因為該語言要在高級統計語言中進行實施。該課程涵蓋了統計計算中的實際問題,其中包括使用R進行編程、將數據讀入R、訪問R程序包、編寫R函數、調試、分析R代碼,以及組織和說明R代碼。統計數據分析主題將會提供使用示例。 

授課大綱
  • 第1週:R概覽(Overview of R)、R數據類型和對象(R data types and objects)、讀取和寫入數據(reading and writing data)
  • 第2週:控制結構(Control structures)、函數(functions)、作用域規則(scoping rules)、日期和時間(dates and times)
  • 第3週:循環函數(Loop functions)、調試工具(debugging tools)
  • 第4週:模擬(Simulation)、代碼分析(code profiling)

先修知識
熟悉編程概念以及統計推理基礎知識會有一定的幫助作用:數據科學工具箱

參考資料
The e-book R Programming for Data Science covers all of the material presented in this course. It is available for download from Leanpub.

授課形式
課程每週都會有視頻、測驗和編程作業。  

作為本課程的一部分,你需要設置GitHub帳戶。 Github是一種工具,用於共享和修改協作代碼。在學習本課程及本專項課程其他課程的過程中,你需要提交自己公開放置在Github帳戶下的文件鏈接,作為同伴互評作業的一部分。如果你擔心自己的身份被他人得知,那麼你需要註冊一個Github匿名帳戶,並且,切記不要添加你不想讓評估的同學看到的信息。

Data Science Specialization Community Site

Since the beginning of the Data Science Specialization, we've noticed the unbelievable passion students have about our courses and the generosity they show toward each other on the course forums. A couple students have created quality content around the subjects we discuss, and many of these materials are so good we feel that they should be shared with all of our students.

We're excited to announce that we've created a site using GitHub Pages: http://datasciencespecialization.github.io/ to serve as a directory for content that the community has created. If you've created materials relating to any of the courses in the Data Science Specialization, please send us a pull request so we can add a link to your content on our site. You can find out more about contributing here: https://github.com/DataScienceSpecialization/DataScienceSpecialization.github.io#contributing

We can't wait to see what you've created and where the community can take this site!
- The JHU Data Science Lab Team
Thu 4 Jun 2015 4:30 AM CST

R Programming: Week 1

As you browse the course web site, please make sure to read through the syllabus which contains important information about the grading policy for quizzes and programming assignments as well as the course schedule. 

Please pay particular attention to the differences among the various Programming Assignments. Whereas Programming Assignments 1 and 3 are graded via unit tests that use a submission script that will compare the output of your functions to the correct output, Programming Assignment 2 requires that you submit R code for evaluation and grading by your classmates.  

This week will cover the basics to get you started up with R. There are videos demonstrating how to install R on Windows and Mac. The Week 1 videos will cover the history of R and S, go over the basic data types in R, and describe the functions for reading and writing data. I recommend that you watch the videos in the order that they are listed on the web page, but watching the videos out of order isn't going to ruin the story. For each lecture video you can download a separate PDF document of the slides (the demonstration videos don't have slides associated with them).
Roger Peng and the Data Science Team
Mon 1 Jun 2015 9:01 PM CST

R Programming: Welcome to swirl

In this course, you have the option to use the swirl R package to practice some of the concepts we cover in lectures. swirl teaches you R programming and data science interactively, at your own pace, and right in the R console.

Each lesson that you complete in swirl is worth one extra credit point. However, the maximum number of points you may earn for the assignment is capped at 5. While these lessons will give you valuable practice and you are encouraged to complete as many as possible, please note that they are completely optional and you can get full marks in the class without completing them.

You can find the instructions for how to install and use swirl in the Programming Assignments section of the course under Week 1. Have fun!
Roger Peng and the Data Science Team
Mon 1 Jun 2015 9:01 PM CST

R Programming: Pre-Course Survey

Thanks for signing up for R Programming. As you probably know, this course is part of the Data Science Specialization, a sequence of nine massive open online courses (MOOCs) plus a Capstone project. We would like to learn more about your motives for taking this course and your intentions for both this course and the overall specialization. To help us out, please complete our short pre-course survey. It should only take about 3 minutes of your time.

Thanks,
The Data Science Team
Mon 1 Jun 2015 9:01 PM CST

Assessments
Quizzes

There will be one quiz every week. The quizzes will all open on the first day of the course but they will be due weekly. So the Week 1 Quiz will be due at the end of the first week and the Week 2 Quiz will be due at the end of the second week, etc.

Please refer to the individual weekly Quiz deadlines to see the exact date and time that each Quiz is due.

Programming Assignments

There will be three required programming assignments. The first programming assignment is due at the end of the second week. Subsequent programming assignments are due weekly after that.

Programming Assignments 1 and 3 will be graded via unit tests using a submission script that will compare the output of your functions to the correct output. To access Programming Assignments 1 and 3, click the corresponding link in the left navigation bar.

Programming Assignment 2 will be submitted differently and graded via a peer assessment. To access Programming Assignment 2, click the corresponding link in the left navigation bar.

swirl Programming Assignment (optional)

In this course, you have the option to use the swirl R package to practice some of the concepts we cover in lectures.

Each lesson that you complete in swirl is worth one extra credit point. However, the maximum number of points you may earn for the assignment is capped at 5. While these lessons will give you valuable practice and you are encouraged to complete as many as possible, please note that they are completely optional and you can get full marks in the class without completing them.

You can find the instructions for how to install and use swirl in the Programming Assignments section of the course under Week 1.

Grading
Quizzes

You may attempt each quiz up to 3 times. The score of your most successful atempt will count toward your grade.

Programming Assignments

Programming assignments 1 and 3 will require submissions via a submission script. You may make an unlimited number of submissions for of the programming assignments 1 and 3, and your most successful submission will count toward your grade. The swirl Programming Assignment is completely optional and extra credit.

Hard deadlines and soft deadlines for Quizzes 1-3 and Programming Assignment 1

The reported due date is the soft deadline for quizzes 1-3 and programming assignment 1. You may turn in quizzes 1-3 and programming assignment 1 up to five days after the soft deadline. The hard deadline is five days after the Quiz is due at 23:30 UTC. If you submit after the due date (but before the hard deadline), your submission score will be penalized by 10% for each day after the due date. If you use a late day, the 10% per day penalty will not be applied for that day.

**Please note: There is no partial credit grace period for Quiz 4 or Programming Assignments 2 and 3. Those deadlines are firm, and work submitted after the hard deadline will not receive credit.

Late Days for Quizzes and Programming Assignment 1

You are permitted a total of 5 late days for quizzes and assignments in the course. If you use a late day, your quiz or assignment grade will not be affected if it is submitted late.

No Late Days for Programming Assignment 2

Peer assessments deadlines have to be synchronous. Therefore, Late Days cannot be applied to Programming Assignment 2. Only one deadline can be set for students to submit and peer-grade each other's work. This is necessary in order to maintain a synchronized peer grading process.

Points and scoring

There are 100 points available in the course. The breakdown of points is as follows:
  • Week 1 Quiz - 20 points
  • Week 2 Quiz - 10 points
  • Week 3 Quiz - 5 points
  • Week 4 Quiz - 10 points
  • Programming Assignment 1 (Air Pollution) - 20 points
  • Programming Assignment 2 (Lexical Scoping) - 10 points
  • Programming Assignment 3 (Hospital Quality) - 25 points
  • swirl Programming Assignment - Maximum of 5 extra credit points
You must earn 70 points to pass the course and earn a certificate. Students who earn 90 points and above will receive a certificate with Distinction.